Search code examples
pythonweb-scrapingbeautifulsouphtml-parsingpython-requests-html

What does 'AttributeError: 'NoneType' object has no attribute 'find_all'' mean in this code?


I am building a quite simple beautifulsoup/requests web scraper, but when running it on a jobs website, the error

AttributeError: 'NoneType' object has no attribute 'find_all'

appears. Here is my code:

import requests
from bs4 import BeautifulSoup

URL = "https://uk.indeed.com/jobs?q&l=Norwich%2C%20Norfolk&vjk=139a4549fe3cc48b"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="ResultsContainer")

job_elements = results.find_all("div", class_="resultContent")

python_jobs = results.find_all("h2", string="Python")

for job_element in job_elements:
    title_element = job_element.find("h2", class_="jobTitle")
    company_element = job_element.find("span", class_="companyName")
    location_element = job_element.find("div", class_="companyLocation")
    print(title_element)
    print(company_element)
    print(location_element)
    print()

Does anyone know what the issue is?


Solution

  • Check your selector for results attribute id should be resultsBody. The wrong selector causes the error in lines that uses results, cause None do not has attributes:

    results = soup.find(id="resultsBody")
    

    and also job_elements it is an td not a div:

    job_elements = results.find_all("td", class_="resultContent")
    

    You could also chain the selectors with css selectors:

    job_elements = soup.select('#resultsBody td.resultContent')
    

    Getting only these that contains Python:

    job_elements = soup.select('#resultsBody td.resultContent:has(h2:-soup-contains("Python"))')
    

    Example

    import requests
    from bs4 import BeautifulSoup
    
    URL = "https://uk.indeed.com/jobs?q&l=Norwich%2C%20Norfolk&vjk=139a4549fe3cc48b"
    page = requests.get(URL)
    
    soup = BeautifulSoup(page.content, "html.parser")
    
    results = soup.find(id="resultsBody")
    
    job_elements = results.find_all("td", class_="resultContent")
    
    python_jobs = results.find_all("h2", string="Python")
    
    for job_element in job_elements:
        title_element = job_element.find("h2", class_="jobTitle")
        company_element = job_element.find("span", class_="companyName")
        location_element = job_element.find("div", class_="companyLocation")
        print(title_element)
        print(company_element)
        print(location_element)
        print()