Search code examples
pythonhtmlclassname

How to deal with multiple lines in class name in html code


I need to find all elements in an html which go by a certain class name. But the class name for some reason consists of a multiple lines as well as multiple spaces. Here is the exact class name I copied from the dev tool in the browser:

            ll-sets-words__row
            false
        

I've tried both selenium and BeautifulSoup to look for those elements by class name, but that doesn't work. However if I try to look for those elements by either css selector or xpath, it finds exactly one element, but I need all of them, that's why I need to look for them by class name, but that weird multiple lined class name seem to not work. Look at this picture Html code

This is my code example:

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import time
    import pickle
    from bs4 import BeautifulSoup

    url1 = 'https://lingualeo.com/'
    url2 ='https://lingualeo.com/ru/dictionary/vocabulary/my'

    s = Service('C:\\Users\\user\\Desktop\\chromedriver- 
    win64\\chromedriver.exe')
    options = webdriver.ChromeOptions()

    options.add_argument('--excludeSwitches')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink- 
    features=AutomationControlled')

    browser = webdriver.Chrome(service=s, options=options)
    browser.maximize_window()
    wait = WebDriverWait(browser, 600)
    browser.get(url1)
    time.sleep(7)

    cookies = pickle.load((open('lingua_cookies.pkl', 'rb')))
    for cookie in cookies:
        browser.add_cookie(cookie)
    time.sleep(2)
    browser.get(url2)

    time.sleep(10)

    class_name = """
                    ll-sets-words__row
                    false
                """

    try:
        entire_dict = browser.find_element(By.CLASS_NAME, 'll-page- 
          vocabulary__sets-words__table')
        print('it worked here1')
        words = entire_dict.find_elements(By.CLASS_NAME, class_name)
        for e in words:
            print(e.text)
    except:
        print('error')
        browser.quit()
    browser.quit()

It stops working at the line where the variable 'words' is assigned. However If I replace the search with By.CSS_SELECTOR or By.XPATH, it works, but it would find just one element. That's why I still need to be looking for all the elements by using CLASS_NAME


Solution

  • the class name is not actually split into multiple lines, but it contains spaces, which are used to separate multiple classes for the same element. For example, the class name ll-sets-words__row means that the element belongs to two classes: ll-sets-words__row and false. To select this element, you can use either of these classes, or both of them together, depending on your needs.