Search code examples
pythonweb-scrapingattributeerror

AttributeError: 'NoneType' object has no attribute 'text'. Web scraping indeed with Python


I'm not having any luck with the other question that is posted on this website. My goal for this program is to scrape job postings from Indeed.com. I'm running into an attribute error. I don't know why I'm receiving this error because I'm making sure the tags match between the HTML and Python. Can anyone help me with this?

Code:

import urllib.request as urllib
from bs4 import BeautifulSoup
import csv

# empty array for results
results = []

# initialize the Indeed URL to url string
url = 'https://www.indeed.com/jobs?q=software+developer&l=Phoenix,+AZ&jt=fulltime&explvl=entry_level'
soup = BeautifulSoup(urllib.urlopen(url).read(), 'html.parser')
results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})

for i in results:
    title = i.find('div', attrs={"class":"title"})
    print('\ntitle:', title.text.strip())

    salary = i.find('span', attrs={"class":"salaryText"})
    print('salary:', salary.text.strip())

    company = i.find('span', attrs={"class":"company"})
    print('company:', company.text.strip())

Error log:

Traceback (most recent call last): File "c:/Users/Scott/Desktop/code/ScrapingIndeed/index.py", line 16, in print('salary:', salary.text.strip())
Scott@DESKTOP-MS37V5T MINGW64 ~/Desktop/code
$ AttributeError: 'NoneType' object has no attribute 'text'

Code from indeed.com I'm trying to scrape:

<span class="salaryText">
$15 - $30 an hour</span>

Solution

  • The answer is relatively simple. You needed to look at the source of the HTML you were attempting to scrape.

    Not all of the div entities had the salary information you were looking for. Because of that some of the searches you ran had returned what Python refers to as a None value entity. That cannot be printed, although you can manipulate that.

    All you need to do to overcome that is check whether or not the value of the salary information is a printable value or not.

    For example take a look at the code as modified below:

        salary = i.find('span', attrs={"class":"salaryText"})
        if salary is not None:
          print('salary:', salary.text)
    

    The entire code is as follows:

    import urllib.request as urllib
    from bs4 import BeautifulSoup
    import csv
    
    # empty array for results
    results = []
    
    # initialize the Indeed URL to url string
    url = 'https://www.indeed.com/jobs?q=software+developer&l=Phoenix,+AZ&jt=fulltime&explvl=entry_level'
    soup = BeautifulSoup(urllib.urlopen(url).read(), 'html.parser')
    results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})
    
    for i in results:
        title = i.find('div', attrs={"class":"title"})
        print('\ntitle:', title.text.strip())
    
        salary = i.find('span', attrs={"class":"salaryText"})
        if salary is not None:
          print('salary:', salary.text)
    
        company = i.find('span', attrs={"class":"company"})
        print('company:', company.text.strip())