Selenium python: get all the <li> text of all the <ul> from a <div>

I would like to get all the list of word that are as dutch word = english word from several pages.

By examining the HTML, it means that I need to get all the texts from all the li of all the ul from the child div of #mw-content-text.

Here is my code:

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('headless')  # start chrome without opening window
driver = webdriver.Chrome(chrome_options=options)

listURL = [
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_2",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Phrases_1",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Negative_1",
]


list_text = []
for url in listURL:
    driver.get(url)
    elem = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div/ul')
    for each_ul in elem:
        all_li = each_ul.find_elements_by_tag_name("li")
        for li in all_li:
            list_text.append(li.text)

print(list_text)

Here is the output

['man = man', 'vrouw = woman', 'jongen = boy', 'ik = I', 'ben = am', 'een = a/an', 'en = and', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

I don't understand why some li text are not retrieve even though their xpath is the same (I double check several of them via the copy xpath of the developer console)

Solution

Try waiting for the page to fully load before parsing it, one way is to use the time.sleep() method:

from time import sleep
...

for url in listURL:
    driver.get(url)
    sleep(5)
    ...

EDIT: Using BeautifulSoup:

import requests
from bs4 import BeautifulSoup


listURL = [
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_2",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Phrases_1",
    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Negative_1",
]


list_text = []
for url in listURL:
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    print("Link:", url)
    
    for tag in soup.select("[id*=Lesson]:not([id*=Lessons])"):
        print(tag.text)
        print()
        print(tag.find_next("ul").text)
        print("-" * 80)
    print()

Output (truncated):

Link: https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1
Lesson 1

man = man
vrouw = woman
jongen = boy
ik = I
ben = am
een = a/an
en = and
--------------------------------------------------------------------------------
Lesson 2

meisje = girl
kind = child/kid
hij = he
ze = she (unstressed)
is = is
of = or
--------------------------------------------------------------------------------
Lesson 3

appel = apple

... And on

If you want the output as a list:

for url in listURL:
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    print("Link:", url)
    print([tag.text for tag in soup.select(".mw-parser-output > ul li")])
    print("-" * 80)