Search code examples

Beautifulsoup selector in Python returns blank result set for valid selector

We want to scrape some content from this webpage. The HTML of the element we are interested in is this (div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2).

enter image description here

For this, we are trying to use this selector in BeautifulSoup (Python). It does not work. I tried three four variants, they did not work as well, the HTML shows that this element is present 36 times in the page. The selectors return either blank set or 2-3 results, so I am obviously missing something. Need to find out the right way of doing it.

from bs4 import BeautifulSoup
import os
import urllib.request

url = ""
with urllib.request.urlopen(url) as response:
        html = str(
        soup = BeautifulSoup(html, 'html.parser')
        elements = soup.find_all('div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2') # This returns blank set
        elements2 = soup.findAll('div', class_=['shadow-kousik-effect', 'mb-2']) #returns just 3 elements, whereas this is a subset class search of the original list of 3 classes, so this should return at least 36 elements
        elements3 ='div.shadow-kousik-effect') # returns just 3 results


  • I think it has to do with your response which on my machine gives tags with trailing \r\n.

    <div\r\n class="white-bg-border-radius-kousik shadow-kousik-effect mb-2">
     <a \r\n="" class="nounderline" href="/world...>

    Using requests, your css selector returns the 35 elements (search-box excluded).

    import requests
    url = ""
    soup = BeautifulSoup(requests.get(url).text, "html.parser")
    css = "div.white-bg-border-radius-kousik.shadow-kousik-effect.mb-2"
    regions = [list(tag.stripped_strings) for tag in]

    Output :

    # len(regions) # 35
        ['ANDAMAN & NICOBAR ISLANDS', '102 Branches'],
        ['ANDHRA PRADESH', '10493 Branches'],
        ['ARUNACHAL PRADESH', '302 Branches'],
        ['ASSAM', '4022 Branches'],
        ['BIHAR', '9113 Branches'],