Search code examples
pythonpython-3.xweb-scrapingpyquery

PyQuery won't return elements on a page


I've set up a Python script to open this web page with PyQuery.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.content)

But pqPage("li") returns only a blank list, []. Meanwhile, pqPage.text() shows the text of the page's HTML, which includes li elements.

Why won't the code return a list of li elements? How do I make it do that?


Solution

  • In seems PyQuery has problem to work with this page - maybe because it is xhtml page. Or maybe because it use namespace xmlns="http://www.w3.org/1999/xhtml"

    When I use

    pqPage.css('li')
    

    then I get

    [<{http://www.w3.org/1999/xhtml}html#sfFrontendHtml>]
    

    which shows {http://www.w3.org/1999/xhtml} in element - it is namespace. Some modules has problem with HTML which uses namespaces.


    I have no problem to get it using Beautifulsoup

    import requests
    from bs4 import BeautifulSoup as BS
    
    url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
    page = requests.get(url)
    
    soup = BS(page.text, 'html.parser')
    for item in soup.find_all('li'):
        print(item.text)
    

    EDIT: after digging in Google I found that using parser="html" in PyQuery() I can get li.

    import requests
    from pyquery import PyQuery
    
    url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
    page = requests.get(url)
    
    pqPage = PyQuery(page.text, parser="html")
    for item in pqPage('li p'):
        print(item.text)