I've set up a Python script to open this web page with PyQuery
.
import requests
from pyquery import PyQuery
url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.content)
But pqPage("li")
returns only a blank list, []
. Meanwhile, pqPage.text()
shows the text of the page's HTML, which includes li
elements.
Why won't the code return a list of li
elements? How do I make it do that?
In seems PyQuery
has problem to work with this page - maybe because it is xhtml
page. Or maybe because it use namespace xmlns="http://www.w3.org/1999/xhtml"
When I use
pqPage.css('li')
then I get
[<{http://www.w3.org/1999/xhtml}html#sfFrontendHtml>]
which shows {http://www.w3.org/1999/xhtml}
in element - it is namespace
. Some modules has problem with HTML
which uses namespaces.
I have no problem to get it using Beautifulsoup
import requests
from bs4 import BeautifulSoup as BS
url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
soup = BS(page.text, 'html.parser')
for item in soup.find_all('li'):
print(item.text)
EDIT: after digging in Google I found that using parser="html"
in PyQuery()
I can get li
.
import requests
from pyquery import PyQuery
url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.text, parser="html")
for item in pqPage('li p'):
print(item.text)