Search code examples
pythonweb-crawlerscreen-scrapingtypeerror

Web Crawler–––TypeError: coercing to Unicode: need string or buffer, NoneType found


I'm new to python. I've made my own web crawler that is supposed to scrape Yelp for practice.


I keep getting this error and can't seem to get past the first page:

 Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 26, in yelpSpider
  TypeError: coercing to Unicode: need string or buffer, NoneType found

Here is my code:

import requests
from BeautifulSoup import BeautifulSoup
def yelpSpider(maxPages):
    page = 0
    listURL = []
    listRATE = []
    listAREA = []
    listADDRESS = []
    listType = []
    while page <= maxPages:
        url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=Manhattan,+NY&start=0' + str(page)
        sourceCode = requests.get(url)
        plainText = sourceCode.text
        soup = BeautifulSoup(plainText)
        for bizName in soup.findAll('a',{'class':'biz-name js-analytics-click'}):
            href = 'https://www.yelp.com.com' + bizName.get('href')
            listURL.append(href)
        for rating in soup.findAll('img',{'class':'offscreen'}):
            stars = rating.get('alt')
            listRATE.append(stars)
        for area in soup.findAll('span',{'class':'neighborhood-str-list'}):
            listAREA.append(area.string)
        for type in soup.findAll('span',{'class':'category-str-list'}):
            listType.append(type)
        for tracker in range(int(page),int(page) + 10):
            print(listURL[tracker])
            print(' ')
            print(listAREA[tracker] + ' | ' + listRATE[tracker])
        page += 10

yelpSpider(20)

Thank you for your help!


Solution

  • The issue is occurring at print(listAREA[tracker] + ' | ' + listRATE[tracker])

    And it happens when your listRATE comes out to be

    ['4.5 star rating',
     '4.5 star rating',
     '4.5 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '5.0 star rating',
     '4.5 star rating',
     '4.0 star rating',
     None,
     None,
     '4.0 star rating',
     '4.5 star rating',
     '4.0 star rating',
     '3.0 star rating',
     '4.0 star rating',
     '3.5 star rating',
     '4.5 star rating',
     '4.5 star rating',
     '5.0 star rating',
     '4.0 star rating',
     None,
     None]
    

    As you can see the tracker: 10 index is None. And None cannot be used in string concatenation.

    So you different alternatives, one is to use a or condition and replace it with ''. Your code will become

    print((listAREA[tracker] or '') + ' | ' + (listRATE[tracker] or ''))
    

    Next option is to fix your listRATE before print

    listRATE = list(map(lambda text: text if text is not None else 'N/A', listRATE))
    

    After execute the above your array will change like below

    ['4.5 star rating',
     '4.5 star rating',
     '4.5 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '4.0 star rating',
     '5.0 star rating',
     '4.5 star rating',
     '4.0 star rating',
     'N/A',
     'N/A',
     '4.0 star rating',
     '4.5 star rating',
     '4.0 star rating',
     '3.0 star rating',
     '4.0 star rating',
     '3.5 star rating',
     '4.5 star rating',
     '4.5 star rating',
     '5.0 star rating',
     '4.0 star rating',
     'N/A',
     'N/A']