Search code examples
rdfsparqldbpedia

Is it possible to retrieve a triple's source dataset in DBPedia?


Suppose I queried DBPedia like this:

select * where
{
  ?x ?y ?z .
  filter (?x = <http://dbpedia.org/resource/Abracadabra>)
}

and got a lot of triples as a result:

x   y   z  
http://dbpedia.org/resource/Abracadabra     http://www.w3.org/2002/07/owl#sameAs    http://de.dbpedia.org/resource/Abrakadabra  
http://dbpedia.org/resource/Abracadabra     http://www.w3.org/2002/07/owl#sameAs    http://fr.dbpedia.org/resource/Abracadabra  
http://dbpedia.org/resource/Abracadabra     http://www.w3.org/2002/07/owl#sameAs    http://ko.dbpedia.org/resource/아브라카다브라  
...

Is it possible to detect which one of these datasets each triple came from?
I want to download and use locally some of those datasets, but first I have to find out which of them are useful for me, based on their contained triples.

In the worst case I'd like to know what dataset(s) contain(s) rdfs:labels.

P.S. This approach doesn't work, ?g is always http://dbpedia.org


Solution

  • Of the directories in the listing that you linked to, I think you'd want to pull down data from the English Wikipedia. That said, there are still lots of files in there. The DBpedia Data Set (3.9) has more information about the different files that you can download. Perhaps most importantly, it says:

    Find the properties used in the different DBpedia data sets here.

    That link brings you to DBpedia 3.9 Data Set Properties, which will answer for you, I think, which properties are in which datasets. To answer your "worst case" specifically, it says that rdfs:label values are stored in the Titles dataset. Even though they call it Titles, I think it's what you'll find as

    labels_en.nq.bz2
    labels_en.nt.bz2
    labels_en.tql.bz2
    labels_en.ttl.bz2 
    

    in the listing that you linked to. I don't know whether or not there's a way to automate looking up the datasets. It would be nice if the table in DBpedia 3.9 Data Set Properties were encoded somewhere and could be queried, because then this would be easy.