Search code examples
sparqlsemantic-webdbpedia

SPARQL: Extracting Unique Entities from DBpedia


Consider the following script:

PREFIX category: <http://dbpedia.org/resource/Category:>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT DISTINCT *
WHERE {
    ?s dcterms:subject category:Living_people .
    ?s foaf:name ?name
}
LIMIT 10000

When running it, I get something like this in result:

Sir Alexander Chapman Ferguson
Sir Alex Ferguson

Though they are different entries, they are definitely the same entities. So I would like to reduce the output when addressing the SPARQL endpoint, i.e. I would like to avoid editing output data because it may be challenging in this case. Could you help me with that? What should be fixed in my query?


Solution

  • As you see when you run your query, both the rows that you mention refer to the same resource: <http://dbpedia.org/resource/Alex_Ferguson>. The fact that you get multiple rows in your query result is simply because there are multiple names for this person.

    So if you just need to ensure that you don't get duplicates in your application, simply make sure that your application treats each unique value for "s" in your query result as a separate person.

    On the other hand, if your problem is the fact that you get multiple names for a person, you could perhaps use some other properties. For example, dbpedia:fullname only has a single entry, likewise the properties dbpedia:surname and dbpedia:givenName.