Search code examples
pythonsparqldbpediardflib

RDFlib query not working


I wrote a Python script that should be able to run through a list of dbpedia URI's and run a query on them. However, for some reason I get an error on

qres = g.query(query) 

when I run this code. Does anyone know why this happens and how I could fix this? I'm really stuck and I'm getting behind on my thesis timeline so the stress is really building.

Code:

import rdflib
import csv
import pandas as pd

colnames = ['Link']

list2 = pd.read_csv('C:/Users/Frank/Google Drive/Master Scriptie/testtest3.csv', sep=',', header=None, usecols=[2], names=colnames)
saved_column = list2.Link 
outputfile = open('C:/Users/Frank/Google Drive/Master Scriptie/code files/dbpedia_output/test_dataset_uri_subject.csv', 'w')

reader = csv.reader(saved_column)

g = rdflib.Graph()
for uri in reader:
    uri2 = "".join(str(x) for x in uri)
    uri2 = uri2[1:].rstrip()
    print (uri2)
    result = g.parse("http://dbpedia.org" + uri2)
    print (result)
    query = "SELECT ?subject WHERE {<http://dbpedia.org" + uri2 + "> dbo:wikiPageRedirects*/dct:subject ?subject .}"
    print ("query: " + query)
    qres = g.query(query)
    for singlerow in qres:
        subject_final = "%s" % singlerow
        outputfile.write("{0}, {1} \n".format(uri,subject_final)

Error message in cmd:

/resource/Sheldon_J._Plankton
[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']].
query: SELECT ?subject WHERE {<http://dbpedia.org/resource/Sheldon_J._Plankton>
dbo:wikiPageRedirects*/dct:subject ?subject .}
Traceback (most recent call last):
  File "rdfimport.py", line 47, in <module>
    qres = g.query(query)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\graph.py", line 1089, in query
    query_object, initBindings, initNs, **kwargs))
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\processor.py", line 75, in query
    query = translateQuery(parsetree, base, initNs)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 764, in translateQuery
    q[1], visitPost=functools.partial(translatePName, prologue=prologue))
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 384, in traverse
    r = _traverse(tree, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in _traverse
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in <listcomp>
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in _traverse
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in <listcomp>
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in _traverse
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in <listcomp>
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in _traverse
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in <listcomp>
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in _traverse
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 339, in <listcomp>
    return [_traverse(x, visitPre, visitPost) for x in e]
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 345, in _traverse
    e[k] = _traverse(val, visitPre, visitPost)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 347, in _traverse
    _e = visitPost(e)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\algebra.py", line 142, in translatePName
    return prologue.absolutize(p)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\sparql.py", line 374, in absolutize
    return self.resolvePName(iri.prefix, iri.localname)
  File "C:\Users\Frank\AppData\Local\Programs\Python\Python36-32\lib\site-packag
es\rdflib\plugins\sparql\sparql.py", line 357, in resolvePName
    raise Exception('Unknown namespace prefix : %s' % prefix)
Exception: Unknown namespace prefix : dct

Thanks in advance :)

EDIT:

I believe something goes wrong in

result = g.parse("http://dbpedia.org" + uri2)

The URI it attempts to parse there in this example is "http://dbpedia.org/resource/Sheldon_J._Plankton"

which also gives an error if I directly put that URI in g.parse. Might this be because that URI is "wrong", since it redirects to

"http://dbpedia.org/resource/Plankton_(character)".

I fixed this in my query with dbo:wikiPageRedirects, but that's after this parse of course. So the problem lies there I think, but how could I get the right page using dbo:wikiPageRedirects if I can't parse it first to get that page??


Solution

  • The error message is complaining about not recognising the prefix dct, RDFLib has dcterms built in or you can bind your own prefixes:

    from rdflib.namespace import DCTERMS, Namespace
    g.bind("dct", DCTerms)
    g.bind("dbo", Namespace("http://dbpedia.org/ontology/"))
    g.bind("dbr", Namespace("http://dbpedia.org/resource/"))
    

    Assuming uri2 is a dbpedia resource and only contains the final part of the URI (i.e. "Sheldon_J._Plankton"), then the SPARQL query to get the redirect page becomes:

    q = "SELECT ?subject WHERE {{ dbr:{} dbo:wikiPageRedirects ?subject. }}".format
    result = g.query(q(uri2))
    for row in result:
        print(row.subject)
    

    To get the subject of the redirect, if it is in your data, this query should work. But you might need to run g.parse over the URIs returned in the previous query to add it to your data:

    q = "SELECT ?subject WHERE {{ dbr:{} dbo:wikiPageRedirects ?redirect. ?redirect dct:subject ?subject. }}".format
    result = q.query(q(uri2))