Search code examples
pythonselenium-webdrivercss-selectors

How to pair an element's children by CSS selector using Selenium WebDriver in Python?


I'm using Selenium WebDriver to scrape information from many web-pages. I wonder if it's possible to select multiple children elements by CSS selector. The HTML structure looks like:

<section id="education">
  <div class="degree">
    <h3 class="school"> School1 </h3>
    <p class="year"> 2002-2008 </p>
  </div>
  <div class="degree">
    <h3 class="school"> School2 </h3>
  </div>
</section>

In this case, I want to select the school names with their corresponding year ranges. But if I simply use:

driver.find_elements_by_css_selector('section[id="education"] div[class="school"]')
driver.find_elements_by_css_selector('section[id="education"] p[class="year"]')

I will get two lists: [School1, School2] and ['2002-2008'], and I won't be able to tell which school corresponds to year range '2002-2008'. So, is it possible to combine the corresponding school name and year range together? If there are other ways to get around it, it would be helpful too.


Solution

  • You have to loop through .degree tags and extract the required information from it in pairs. Here's how to do it the normal way:

    education = driver.find_element_by_id("education")
    for degree in education.find_element_by_class_name("degree"):
        school = degree.find_element_by_class_name("school")
        year = degree.find_element_by_class_name("year")
        print(school.text, year.text)
    

    And here's how to do it using CSS selector:

    for degree in driver.find_elements_by_css_selector("#education .degree"):
        school = degree.find_element_by_css_selector(".school")
        year = degree.find_element_by_css_selector(".year")
        print(school.text, year.text)
    

    Note: as @Andersson commented, you should check if elements (.year and .school) exist if it's possible to be missing using one of the methods mentioned in this answer. Otherwise, this code might throw a NoSuchElementException.