I'm stuck on this scraper in ScraperWiki. I just want the text from the li-elements in the ul with dir='ltr'. I run this script every week and sentences could be similar to each other, while being a completely new sentence. That's why I want to include the date in my data.
import scraperwiki
from mechanize import Browser
import lxml.html
from datetime import date, timedelta, datetime
from scraperwiki import sqlite
datum = date.today()
print datum
url = 'http://www.knvb.nl/nieuws/excessenav/actueel'
mech = Browser()
page = mech.open(url)
tree = lxml.html.fromstring(page.read())
# print tree.xpath('//ul[@dir="ltr"]/li') # access li elements
# print tree.xpath('//ul[@dir="ltr"]/li/text()') # access text in li elements
uitspraak = tree.xpath("//ul[@dir='ltr']/li/text()")
print uitspraak
# Saving data:
unique_keys = [ 'datum', 'uitspraak' ]
data = { 'datum':datum, 'uitspraak':uitspraak }
scraperwiki.sql.save(unique_keys, data)
I get the following error:
Traceback (most recent call last):
File "./code/scraper", line 28, in <module>
scraperwiki.sql.save(unique_keys, data)
File "/usr/local/lib/python2.7/dist-packages/scraperwiki/sqlite.py", line 34, in save
return dt.upsert(data, table_name = table_name)
File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 301, in upsert
self.insert(upsert=True, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 284, in insert
self.execute(sql, values, commit=False)
File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 138, in execute
raise self.sqlite3.InterfaceError(unicode(msg) + '\nTry converting types or pickling.')
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
Try converting types or pickling.
How can I make this scraper save my data?
This gives a list of results, and the list is what causes the unsupported type error:
tree.xpath("//ul[@dir='ltr']/li/text()")
You must instead store each result:
unique_keys = ['datum', 'uitspraak']
for x in uitspraak:
data = {'datum': datum, 'uitspraak': x}
scraperwiki.sql.save(unique_keys, data)
There may be a better way to store multiple results – I've never used scraperwiki before and don't know the API.