Search code examples
pythonweb-scrapingbeautifulsoupattributeerror

BS4: Attribute Error in Web Scraping with Python


I need to extract from this website link name of the city where shops are located. I created this code:

def get_page_data(number):
    print('number:', number)

    url = 'https://www.biedronka.pl/pl/sklepy/lista,lat,52.25,lng,21,page,'.format(number)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    container = soup.find(class_='s-content shop-list-page')
    items = container.find_all(class_='shopListElement')

    dane = []
    for item in items:
        miasto = item.find(class_='h4').get_text(strip=True)
        adres = item.find(class_='shopFullAddress').get_text(strip=True)
        dane.append([adres])

    return dane

wszystkie_dane = []
for number in range(1, 2):
    dane_na_stronie = get_page_data(number)

    wszystkie_dane.extend(dane_na_stronie)

dane = pd.DataFrame(wszystkie_dane, columns=['miasto','adres'])

dane.to_csv('biedronki_lista.csv', index=False)

The problem appears in:

   miasto = item.find(class_='h4').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'

Any ideas how to extract name of the city (in h4) from this website?


Solution

  • Try using:

    miasto = item.find('h4').text.split()[0]
    

    Or:

    miasto = item.find('h4').get_text(strip=True)
    

    Note:

    "h4" is a tag, not a class.


    Explanation:

    • When you give .find('h4'), it returns:
    <h4 style="margin-bottom: 10px;">
    
                    Rzeszów             <span class="shopFullAddress">ul.<span class="shopAddress"> </span></span>
    
    • When you give .text, it returns:
    'Rzeszów            \tul.'
    
    • When you give .split(), it returns:
    ['Rzeszów', 'ul.']
    
    • And from this we take what we require.

    So do this where-ever you face error in this code.

    dane = []
        for item in items:
            miasto = item.find('h4').get_text(strip=True)
            adres = item.find('shopFullAddress').get_text(strip=True)
            dane.append([adres])