Search code examples
pythonseleniumcss-selectorssiblingsnextsibling

How to find all next siblings from a particular class using css selectors


I want to scrape this website Hotel Association Data and need help for the css selector. If you see the below image, I am trying to extract address from here using a css selector.

Data I want to scrape : 20 West 29th Street and New York, NY 10001

enter image description here

Using Next Sibling Method

I know we can find next sibling using + sign, but the problem here is both the address text doesn't have any attribute associated with it. I don't want to use xpath here but a generic css selector to find all the siblings of .hanyccompany and then extract text from it.

Can anyone tell me how to find all the siblings of class='hanyccompany

<span class="hanyccompany"><a href="http://www.acehotel.com/" target="_blank">ACE HOTEL NEW YORK</a></span><br />
20 West 29th Street<br />
New York, NY 10001<br />

Solution

  • You can parse and extract data easily using BeautifulSoup.

    from bs4 import BeautifulSoup
    from mechanize import Browser
    
    br = Browser()
    br.addheaders = [('User-agent', 'Firefox')]
    response = br.open("http://www.hanyc.org/members/hotels/")
    
    web_data = response.read()
    
    soup = BeautifulSoup(web_data, "html.parser")
    tags = soup.find_all('span', attrs={"class": "hanyccompany"})
    
    for tag in tags:
        print(tag.parent.text)
        print("------------------------------")
    

    if you print text of span's parent, you'll get something like

    ACE HOTEL NEW YORK
    20 West 29th Street
    New York, NY 10001
    Jan Rozenveld, Managing Director
    (212) 679-2222
    (212) 679-1947
    [email protected]
    
    ...