I try to retrieve data from dbpedia but I get error every time i run the code.
The code in Python is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <http://dbpedia.org/resource/Musée_du_Louvre> dcterms:subject ?subject }
""")
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]
I believe that I must use a different char for "é" in "Musée_du_Louvre"but I cant figure which. Thx!
The first problem is that SPARQLWrapper
seems to expect its query to be in unicode, but you're passing it an utf-8 encoded string - that's why you get a UnicodeDecoreError
. Instead you should pass it a unicode object, either by decoding your utf-8 string
unicode_obj = some_utf8_string.decode('utf-8')
or by using an unicode literal:
unicode_obj = u'Hello World'
Passing it a unicode object avoids that UnicodeDecodeError
, but doesn't yield any results. So it looks the dbpedia API expects URLs containing non-ASCII characters to be percent-encoded. Therefore you need to encode the URL beforehand using urllib.quote_plus
:
from urllib import quote_plus
encoded_url = quote_plus(url, safe='/:')
With these two changes your code could look like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from SPARQLWrapper import SPARQLWrapper, JSON
from urllib import quote_plus
url = 'http://dbpedia.org/resource/Musée_du_Louvre'
encoded_url = quote_plus(url, safe='/:')
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
query = u"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject
WHERE { <%s> dcterms:subject ?subject }
""" % encoded_url
sparql.setQuery(query)
# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["subject"]["value"]