Search code examples
sparqldbpedia

Filtering DBpedia disambiguation page


I have a SPARQL Query, and I want to eliminate all disambigution resources. How can I do this? This is my query:

prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> 
prefix foaf: <http://xmlns.com/foaf/0.1/> 

select distinct ?Nom ?resource ?url where {
   ?resource rdfs:label ?Nom.
   ?resource foaf:isPrimaryTopicOf ?url.
   FILTER (langMatches( lang(?Nom), "EN" )).
   ?Nom <bif:contains> "Apple".
}  

Solution

  • You can add the following prefix and filter to your query:

    prefix dbo: <http://dbpedia.org/ontology/>
    
    filter not exists {
      ?resource dbo:wikiPageRedirects*/dbo:wikiPageDisambiguates ?dis
    }
    

    This says to exclude resources and resources that redirect to a resources that disambiguate some articles. That gives you a query like this:

    prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    prefix foaf: <http://xmlns.com/foaf/0.1/> 
    prefix dbo: <http://dbpedia.org/ontology/>
    
    select distinct ?Nom ?resource ?url where {
       ?resource rdfs:label ?Nom.
       ?resource foaf:isPrimaryTopicOf ?url.
       FILTER (langMatches( lang(?Nom), "EN" )).
       ?Nom <bif:contains> "Apple".
       filter not exists {
         ?resource dbo:wikiPageRedirects*/dbo:wikiPageDisambiguates ?dis
       }
    }
    

    SPARQL results

    Now, even though that removes all the disambiguation pages, you may still have results that include "disambiguation" in the title. For instance, one of the results is:

        The Little Apple (disambiguation)"@en
        http://dbpedia.org/resource/The_Little_Apple_(disambiguation)

    Even though that has "disambiguation" in the name, it's not a disambiguation page. It doesn't have any values for dbo:wikiPageDisambiguates. it does redirect to another page, though. You may want to filter out things that redirect to something else, too. You can modify the filter though:

    filter not exists { ?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis }

    That says to filter out any resource that either redirects to something, or that disambiguates something. This is actually a simpler filter, really. This makes your query:

    prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    prefix foaf: <http://xmlns.com/foaf/0.1/> 
    prefix dbo: <http://dbpedia.org/ontology/>
    
    select distinct ?Nom ?resource ?url where {
       ?resource rdfs:label ?Nom.
       ?resource foaf:isPrimaryTopicOf ?url.
       FILTER (langMatches( lang(?Nom), "EN" )).
       ?Nom <bif:contains> "Apple".
    
       filter not exists {
         ?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis
       }
    }
    

    SPARQL results