Search code examples
pythonselenium-webdriverweb-scrapingbeautifulsoupcss-selectors

How to find HTML elements by multiple tags with selenium


I need to scrape data from a webpage with selenium. I need to find these elements:

<div class="content-left">
    <ul></ul>
    <ul></ul>
    <p></p>
    <ul></ul>
    <p></p>
    <ul></ul>
    <p></p>
    <ul>
        <li></li>
        <li></li>
    </ul>
    <p></p>
</div>

As you can see <p> and <ul> tags has no classes and I don't know how to get them in order.

I used Beautifulsoup before:

allP = bs.find('div', attrs={"class":"content-left"})
txt = ""
for p in allP.find_all(['p', 'li']):

But It's not working anymore (got 403 error by requests). And I need to find these elements with selenium.

HTML:

This image


Solution

  • To extract the texts from <p> and <li> tags only you can use Beautiful Soup as follows:

    from bs4 import BeautifulSoup
    
    html_text = '''
    <div class="content-left">
        <ul>1</ul>
        <ul>2</ul>
        <p>3</p>
        <ul>4</ul>
        <p>5</p>
        <ul>6</ul>
        <p>7</p>
        <ul>
            <li>8</li>
            <li>9</li>
        </ul>
        <p>10</p>
    </div>
    '''
    soup = BeautifulSoup(html_text, 'html.parser')
    parent_element = soup.find("div", {"class": "content-left"})
    for element in parent_element.find_all(['p', 'li']):
        print(element.text)
    

    Console output:

    3
    5
    7
    8
    9
    10
    

    Using Selenium

    Using Selenium you can use list comprehension as follows:

    • Using CSS_SELECTOR:

      print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.content-left p, div.content-left li")])