Search code examples
pythonsparqldbpedia

Error when I retrieve data from dbpedia


I try to retrieve data from dbpedia but I get error every time i run the code.

The code in Python is:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?subject
    WHERE { <http://dbpedia.org/resource/Musée_du_Louvre> dcterms:subject ?subject }
""")

# JSON example
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
    print result["subject"]["value"]

I believe that I must use a different char for "é" in "Musée_du_Louvre"but I cant figure which. Thx!


Solution

  • The first problem is that SPARQLWrapper seems to expect its query to be in unicode, but you're passing it an utf-8 encoded string - that's why you get a UnicodeDecoreError. Instead you should pass it a unicode object, either by decoding your utf-8 string

    unicode_obj = some_utf8_string.decode('utf-8')
    

    or by using an unicode literal:

    unicode_obj = u'Hello World'
    

    Passing it a unicode object avoids that UnicodeDecodeError, but doesn't yield any results. So it looks the dbpedia API expects URLs containing non-ASCII characters to be percent-encoded. Therefore you need to encode the URL beforehand using urllib.quote_plus:

    from urllib import quote_plus
    encoded_url = quote_plus(url, safe='/:')
    

    With these two changes your code could look like this:

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    from SPARQLWrapper import SPARQLWrapper, JSON
    from urllib import quote_plus
    
    url = 'http://dbpedia.org/resource/Musée_du_Louvre'
    encoded_url = quote_plus(url, safe='/:')
    
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    
    query = u"""
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        SELECT ?subject
        WHERE { <%s> dcterms:subject ?subject }
    """ % encoded_url
    
    sparql.setQuery(query)
    
    # JSON example
    print '\n\n*** JSON Example'
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    for result in results["results"]["bindings"]:
        print result["subject"]["value"]