Search code examples
pythonpandasbeautifulsouptraceback

TypeError: Cannot read object of type 'list'


as far as i can tell, i haven't created a list, yet it's giving me a

TypeError: Cannot read object of type 'list'.

Any thoughts?

Python newbie, so go easy.

Any and all help is appreciated.

sample url:

https://nclbgc.org/search/licenseDetails?licenseNumber=80479

here is the full traceback:

Traceback (most recent call last):
  File "ncscribble.py", line 26, in <module>
    df = pd.read_html(url)[0].dropna(how='all')
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 987, in read_html
    displayed_only=displayed_only)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 815, in _parse
    raise_with_traceback(retained)
  File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\compat\__init__.py", line 404, in raise_with_traceback
    raise exc.with_traceback(traceback)
TypeError: Cannot read object of type 'list'

Full Code:

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import time
import csv
import pandas as pd
import os
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders

def license_exists(soup):
    with open('NC_urls.csv','r') as csvf:
        urls = csv.reader(csvf)
        for url in urls:
            if soup(class_='btn btn-primary"'):
                return False
            else:
                return True


with open('NC_urls.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        df = pd.read_html(url)[0].dropna(how='all')
        df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T
        if not license_exists(soup(page, 'html.parser')):
            # if the license is present we don't want to parse any more urls.

            break


df.to_csv('NC_Licenses_Daily.csv', index=False)

Solution

  • When you encounter a type error, it is usually a good idea to print the value, like this:

        for url in urls:
            print(repr(url))
            df = pd.read_html(url)[0].dropna(how='all')
    

    It will give you:

    ['https://nclbgc.org/search/licenseDetails?licenseNumber=80479']
    

    This is because a CSV row is itself a list. You need to obtain the first list element and pass that to the HTML processor:

        for url in urls:
            df = pd.read_html(url[0])[0].dropna(how='all')
    

    To get the page data, you could use requests:

    import requests
    page = requests.get(url[0]).content