Search code examples
pythonpython-3.xbeautifulsouphtml-parsing

bs4 cannot call findchildren due to NoneType


I tried using Beautiful Soup in order to parse some HTML and am having some trouble with the following code.

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from html.parser import HTMLParser
from bs4 import BeautifulSoup

results = 20
driver = webdriver.Chrome()
driver.get('https://www.hydroshare.org/search')
element = WebDriverWait(driver,60).until(EC.presence_of_element_located((By.ID, 'items-discovered_wrapper')))

innerHTML = driver.execute_script("return document.body.innerHTML")
soup = BeautifulSoup(innerHTML,'html.parser')
table = soup.find('table',{'id':'items-discovered'})
print (type(table)) # Returns <class 'bs4.element.Tag'
children = table.findchildren() # TypeError: 'NoneType' Object is not callable

I cannot tell here why printing out type(table) returns a Tag object, but then when I try and run table.findchildren() (or any other function allowed by the Tag object), somehow this table gets converted into null. I have also typed in

print(table)

which results in a HTML string.

Does anyone know why this is happening or how to work around it?


Solution

  • I cannot tell here why printing out type(table) returns a Tag object

    Because tables type is a BS4 tag.

    As for finding the children, the function you want to use is actually .findChildren(). You just missed capitalizing the 'C'. So what you really need on that last line is:

    children = table.findChildren()

    EDIT: As @abarnert pointed out, .findChildren() is technically depricated in beautifulsoup4, however the function still exists. The new way to get the same result would be to use .find_all() without specifying any parameter. Both functions work and return the same results. So it would be best to use children = table.find_all()