Search code examples
pythonlistbeautifulsoupelement

Why soup. find_all() only returning one result


I'm scraping some information and below is my code

from bs4 import BeautifulSoup
import requests

url = "https://www.privateproperty.com.ng/property-for-sale"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find_all('div', class_="similar-listings-item sponsored-listing")

for result in results:
    Title = result.find('div', class_= "similar-listings-info").text.replace('\n','')
    location = result.find( class_= "listings-location").text.replace('\n','')
    Price = result.find('div', class_= "similar-listings-price").text.replace('\n','')
    

info = (Title, location, Price)
print(info)

Why does this line

results = soup.find_all('div', class_="similar-listings-item sponsored-listing") 

return only the 1st element?


Solution

  • Why does this line

    results = soup.find_all('div', class_="similar-listings-item sponsored-listing") 
    

    return only the 1st element?

    I'm getting 2 elements, but maybe you're only seeing the last result because the info=...print(info) lines are after the loop instead of inside it. Indent them to print every the result from inside the loop.


    If your issue is that you want all the listings, you should note that only the sponsored listings have the sponsored-listing class. To get all the listings, you can try using

    results = soup.find_all('div', {'class': "similar-listings-item"}) ## OR
    # results = soup.select('div.similar-listings-item')
    

    [Use soup.select('div.similar-listings-item:not(.sponsored-listing)') if you only want unsponsored listings. Check out how to use .select with CSS selectors for more details.]


    I want to extract list of lists from the (variable)

    which variable? If you want list of all the Title, location, Price for each result, initiate an empty list [like infoList] before the loop, then indent info=... to include it in the list, and append info to infoList at the end of the loop (but still in the loop). Something like

    infoList = []
    for result in results:
        Title = result.find('div', class_= "similar-listings-info").text.replace('\n','')
        location = result.find( class_= "listings-location").text.replace('\n','')
        Price = result.find('div', class_= "similar-listings-price").text.replace('\n','')
    
        info = (Title, location, Price) # this is a tuple btw, so 
        # infoList.append(info) # --> list of tuples
        infoList.append([Title, location, Price]) # --> list of lists
        # print(info) # will print for every result
    print(info) # will print ONLY the LAST result
    

    Btw, it's not very safe to chain .find and .text like that. If .find doesn't find any thing, then an error will be raised when trying to get .text. To be more cautious, you should check that find returned something first.

    You could use my selectForList function like infoList = [selectForList(result, ['div.similar-listings-info', 'p.listings-location', 'div.similar-listings-price']) for result in results] or [since you want to remove the \ns and also if you don't want to use CSS selectors] use a variation of it:

    def get_min_text(containerTag, elName, classAttr, defaultVal=None):
        el = containerTag.find(elName, class_=classAttr)
        if el is None: return defaultVal
        return ' '.join(el.get_text(' ').split()) # split+join minimizes whitespace
    
    results = soup.find_all('div', {'class': "similar-listings-item"}) 
    infoList = [[get_min_text(result, *c[:3]) for c in [
        ('div', 'similar-listings-info'), # Title
        ('p', 'listings-location'), # Location
        ('div', 'similar-listings-price') # Price
    ]] for result in results]
    

    op