Search code examples
rdfsparqlsemantic-webdbpedialinked-data

How to get intersection-like behavior with SPARQL 1.1's VALUES?


Using SPARQL 1.1's values, the following query returns all predicates with Einstein or Knuth as the subject (along with their labels).

PREFIX dbp: <http://dbpedia.org/resource/>

SELECT DISTINCT ?sub ?outpred ?label
{
  VALUES ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
  ?sub ?outpred [] .
  ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}

SPARQL results

Is it possible to use this values feature to expose an intersection rather than a union of the predicates? Or am I misunderstanding what values is for?

EDIT: Clarification

For a simplified example, say there are these triples:

<Einstein>  <influenced>    <John>
<Einstein>  <influenced>    <Knuth>
<Einstein>  <born>          <Mars>
<Einstein>  <died>          <Los Angeles>
<Knuth>     <influenced>    <Kirby>
<Knuth>     <born>          <Mars>
<Knuth>     <wrote>         <TAOCP>
<Knuth>     <drove>         <Truck>

The "union" I'm getting is all unique predicates attached to either subject (line separated for clarity):

|  ?sub    |  ?pred     |
-------------------------
<Einstein>  <influenced>
<Knuth>     <influenced>

<Einstein>  <born>
<Knuth>     <born>

<Einstein>  <died>

<Knuth>     <wrote>

<Knuth>     <drove>

The "intersection" I'm after is all unique predicates common to both subjects:

|  ?sub    |  ?pred     |
-------------------------
<Einstein>  <influenced>
<Knuth>     <influenced>

<Einstein>  <born>
<Knuth>     <born>

Solution

  • The Solutions

    You can use a query like this. The trick is to group by the predicate, and only take those predicates for which there are exactly two subjects (Einstein and Knuth).

    select distinct ?outpred ?label
    {
      values ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
      ?sub ?outpred [] .
      ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
    }
    group by ?outpred ?label
    having count(distinct ?sub) = 2
    

    Of course, this does require retrieving all the data that you would need for a union, and then condensing it down. I don't expect that that will be much of a problem, but if it is (e.g., if you're trying to take the intersection for lots of subjects), then you can also just list the subjects separately:

    select distinct ?outpred ?label
    {
      dbp:Albert_Einstein ?outpred [].
      dbp:Donald_Knuth ?outpred [].
      ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
    }
    

    Discussion

    Is it possible to use this VALUES feature to expose an intersection rather than a union of the predicates? Or am I misunderstanding what VALUES are for?

    Values essentially is another set of bindings that gets joined with the other bindings, so it can't do intersection for you the way that you'd like. However, to do an "intersection" of the sort you're looking for here isn't too hard:

    select distinct ?outpred ?label
    {
      dbp:Albert_Einstein ?outpred [] .
      dbp:Donald_Knuth ?outpred [] .
      ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
    }
    

    Now, that said, that could be a lot of triple patterns to write, so you might want some query where the only thing you have to change is a list of values. You can specify the values, and then group by the property and label (i.e., the non-values variables), and just take those solution for which count(distinct ?sub) is the number of values that you specified. E.g.:

    select distinct ?outpred ?label
    {
      values ?sub { dbp:Albert_Einstein dbp:Donald_Knuth }
      ?sub ?outpred [] .
      ?outpred <http://www.w3.org/2000/01/rdf-schema#label> ?label .
    }
    group by ?outpre ?label
    having count(distinct ?sub) = 2
    

    This way, in order to get count(distinct ?sub) to be 2, you must have had ?sub ?outpred [] match for both ?sub = Einstein and ?sub = Knuth.

    Checking the Approach

    We can use the DBpedia endpoint to work through these. First, a simplified query:

    select distinct ?s ?p where {
      values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
      ?s ?p []
    }
    

    SPARQL results

    s                                             p
    http://dbpedia.org/resource/Albert_Einstein   http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    http://dbpedia.org/resource/Donald_Knuth      http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    http://dbpedia.org/resource/Albert_Einstein   http://www.w3.org/2002/07/owl#sameAs
    http://dbpedia.org/resource/Donald_Knuth      http://www.w3.org/2002/07/owl#sameAs
    ⋮                                            ⋮
    

    Now, it doesn't make sense to ask for an intersection while we're still selecting ?s, because Einstein ≠ Knuth, so there's never any intersection. But we can take an intersection on ?p. Here's a query that gets all the properties for which both have values:

    select distinct ?p where {
      dbpedia:Albert_Einstein ?p [] .
      dbpedia:Donald_Knuth ?p []
    }
    

    SPARQL results

    A similar query counts the results for us:

    select (count(distinct ?p) as ?np) where {
      dbpedia:Albert_Einstein ?p [] .
      dbpedia:Donald_Knuth ?p [] .
    }
    

    There are 45 properties that they both have.

    The group by query is

    select distinct ?p where {
      values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
      ?s ?p []
    }
    group by ?p
    having count(?s) = 2
    

    Now lets make sure that the other approach gets the same results:

    select (count(*) as ?np) where {
      select distinct ?p where {
        values ?s { dbpedia:Albert_Einstein dbpedia:Donald_Knuth }
        ?s ?p []
      }
      group by ?p
      having count(distinct ?s) >= 2
    }
    

    This also returns 45, so we see that we get the same results.