Search code examples
sparqldbpedia

find the subjects that are connecting entites in DBpedia with SPARQL


I'm extracting entities from a text, most of the time I get multiple entites, for example <http://dbpedia.org/resource/NASA>, <http://dbpedia.org/resource/IPhone> and <http://dbpedia.org/resource/Apple_Inc.>

These two entites, don't share the same dct:subject is there a way to query a path to get a list of the subjects connect my entities?

My goal is to create kind of a "page rank", to find the most relevant subjects for a given entity.

Preferably with a counter how many steps are between them.

I've tried to brute force it, start with a entity, get all the subjects and then get all entites for the subject and so on but the queries are starting to get enormous.


Solution

  • Springing from @AKSW's comments...

    One option, without limit on length of skos:broader path lengths, which will exceed resource consumption limits on the public DBpedia endpoint, but which could be run on a private instance (in the cloud or wherever) where you may relax those limits --

    PREFIX   dbr:  <http://dbpedia.org/resource/>
    PREFIX   dct:  <http://purl.org/dc/terms/>
    PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>
    
    SELECT DISTINCT ?cat 
    WHERE
      { <http://dbpedia.org/resource/Apple_Inc.>
            dct:subject/skos:broader*  ?cat . 
        dbr:IPhone 
            dct:subject/skos:broader*  ?cat . }
    

    The succinct option, using Virtuoso-specific syntax (based on an early draft of SPARQL Property Paths) to limit the path's length (here requiring at least 1 skos:broader and permitting up to 3) --

    PREFIX   dbr:  <http://dbpedia.org/resource/>
    PREFIX   dct:  <http://purl.org/dc/terms/>
    PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>
    
    SELECT DISTINCT ?cat 
    WHERE
      { ?cat
           ^skos:broader{1,3}/^dct:subject
               <http://dbpedia.org/resource/Apple_Inc.> , 
               dbr:IPhone 
      }
    

    Another succinct option, this time using standard SPARQL Property Paths syntax to limit the path's length --

    PREFIX   dbr:  <http://dbpedia.org/resource/>
    PREFIX   dct:  <http://purl.org/dc/terms/>
    PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>
    
    SELECT DISTINCT ?cat 
    WHERE
      { ?cat
           ^skos:broader/^skos:broader?/^skos:broader?/^dct:subject
               <http://dbpedia.org/resource/Apple_Inc.> , 
               dbr:IPhone 
      }
    

    You can also use 2 statements with the uninverted paths in the WHERE clauses, first in Virtuoso-specific form --

      { <http://dbpedia.org/resource/Apple_Inc.> 
           dct:subject/skos:broader{1,3}   ?cat  .
        dbr:IPhone 
           dct:subject/skos:broader{1,3}   ?cat  .
      }
    

    -- and then in standard SPARQL --

      { <http://dbpedia.org/resource/Apple_Inc.> 
           dct:subject/skos:broader/skos:broader?/skos:broader?   ?cat  .
        dbr:IPhone 
           dct:subject/skos:broader/skos:broader?/skos:broader?   ?cat  .
      }