I tried using Beautiful Soup in order to parse some HTML and am having some trouble with the following code.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from html.parser import HTMLParser
from bs4 import BeautifulSoup
results = 20
driver = webdriver.Chrome()
driver.get('https://www.hydroshare.org/search')
element = WebDriverWait(driver,60).until(EC.presence_of_element_located((By.ID, 'items-discovered_wrapper')))
innerHTML = driver.execute_script("return document.body.innerHTML")
soup = BeautifulSoup(innerHTML,'html.parser')
table = soup.find('table',{'id':'items-discovered'})
print (type(table)) # Returns <class 'bs4.element.Tag'
children = table.findchildren() # TypeError: 'NoneType' Object is not callable
I cannot tell here why printing out type(table) returns a Tag object, but then when I try and run table.findchildren() (or any other function allowed by the Tag object), somehow this table gets converted into null. I have also typed in
print(table)
which results in a HTML string.
Does anyone know why this is happening or how to work around it?
I cannot tell here why printing out type(table) returns a Tag object
Because table
s type
is a BS4 tag.
As for finding the children, the function you want to use is actually .findChildren()
. You just missed capitalizing the 'C'. So what you really need on that last line is:
children = table.findChildren()
EDIT: As @abarnert pointed out, .findChildren()
is technically depricated in beautifulsoup4
, however the function still exists. The new way to get the same result would be to use .find_all()
without specifying any parameter. Both functions work and return the same results. So it would be best to use children = table.find_all()