Search code examples
rdfsparqlsemantic-webdbpedialinked-data

sparql / dbpedia regarding extracting rdf:type person


I'd like to extract all the dpbedia entries of rdf:type person using some things called dbpedia and sparql which I barely understand.

I was mostly successful with the following (varying the offset). Is there a better way? I'd like to basically get all the examples of people from the English wikipedia.

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?name ?birth ?description ?person WHERE {
     ?person dbo:birthDate ?birth .
     ?person foaf:name ?name .
     ?person rdfs:comment ?description .
     FILTER (LANG(?description) = 'en') .
}
ORDER BY ?name
OFFSET 100

Solution

  • You're going about it in roughly the right way, though you should OFFSET and LIMIT, so that you can paginate the results (and of course, for OFFSET and LIMIT to be useful, you need to keep using the ORDER BY). You're using more prefixes than you need, though. You only use three, so you only need to declare those three. Finally, you can specially ask for things of type Person. There are 1649645 of them.

    select (count(*) as ?n) where {
     ?person a dbo:Person 
    }
    

    1649645

    Finally, you should check the languages of strings with langMatches, not =. The webservice that you can work with interactively defines some prefixes, so I usually follow those. You might also want to select only English names, and probably order by the URI, since the names aren't always perfect:

    select ?person ?name ?birth ?description where {
      ?person a dbo:Person ;
              foaf:name ?name ;
              dbo:birthDate ?birth ;
              dbo:abstract ?description
      filter langMatches(lang(?name),'en')
      filter langMatches(lang(?description),'en')
    }
    order by ?person
    offset 100
    limit 50
    

    SPARQL results

    Of course, if you want lots of data, you might want to just download it and store it locally. See DBpedia 2014 Downloads.