as far as i can tell, i haven't created a list, yet it's giving me a
TypeError: Cannot read object of type 'list'.
Any thoughts?
Python newbie, so go easy.
Any and all help is appreciated.
sample url:
https://nclbgc.org/search/licenseDetails?licenseNumber=80479
here is the full traceback:
Traceback (most recent call last):
File "ncscribble.py", line 26, in <module>
df = pd.read_html(url)[0].dropna(how='all')
File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 987, in read_html
displayed_only=displayed_only)
File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\io\html.py", line 815, in _parse
raise_with_traceback(retained)
File "C:\Users\rkrouse\Desktop\Python\lib\site-packages\pandas\compat\__init__.py", line 404, in raise_with_traceback
raise exc.with_traceback(traceback)
TypeError: Cannot read object of type 'list'
Full Code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import time
import csv
import pandas as pd
import os
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders
def license_exists(soup):
with open('NC_urls.csv','r') as csvf:
urls = csv.reader(csvf)
for url in urls:
if soup(class_='btn btn-primary"'):
return False
else:
return True
with open('NC_urls.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
df = pd.read_html(url)[0].dropna(how='all')
df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T
if not license_exists(soup(page, 'html.parser')):
# if the license is present we don't want to parse any more urls.
break
df.to_csv('NC_Licenses_Daily.csv', index=False)
When you encounter a type error, it is usually a good idea to print the value, like this:
for url in urls:
print(repr(url))
df = pd.read_html(url)[0].dropna(how='all')
It will give you:
['https://nclbgc.org/search/licenseDetails?licenseNumber=80479']
This is because a CSV row is itself a list. You need to obtain the first list element and pass that to the HTML processor:
for url in urls:
df = pd.read_html(url[0])[0].dropna(how='all')
To get the page data, you could use requests
:
import requests
page = requests.get(url[0]).content