Search code examples
rdfsparqlsemantic-webdbpedialinked-data

Multiple counts in Sparql query


I would like to create a Sparql query that contains two counts.

The query should get the 'neighbours of neighbours' of A (A → B → C, where A is the start node), and should report for each C, how many paths there were from A to C, and how many "inlinks" there are to C from anywhere. The result set should be as follow:

C | #C |  C_INLINKS
--------------------------
A | 2  | 123
B | 3  | 234

Where #C is the number of paths to C from starting node A.

I can create the counts separately, but I don't know how to combine these:

Count neighbours of neighbours:

select ?c count(?c) as ?countc WHERE {
   <http://dbpedia.org/resource/AFC_Ajax> ?p1 ?b.
   ?b ?p2 ?c.
   FILTER (regex(str(?c), '^http://dbpedia.org/resource/'))
}
GROUP BY ?c
ORDER BY DESC(?countc)
LIMIT 100

Count inlinks to neighbours of neigbours

select ?c count(?inlink) as ?inlinks WHERE {
   <http://dbpedia.org/resource/AFC_Ajax> ?p1 ?b.
   ?b ?p2 ?c.
   ?inlink ?p3 ?c
   FILTER (regex(str(?c), '^http://dbpedia.org/resource/'))
}
GROUP BY ?c
ORDER BY DESC(?inlinks)
LIMIT 100

Is it possible to combine these two queries? Thank you!


Solution

  • The counts you're trying to extract require you to group by different things. group by lets you specify what you're trying to count with respect to. E.g., when you say, select (count(?x) as ?xn) {...} group by ?y, you're saying "how many ?x's appear per each value of ?y. The counts you're looking for are: "how many C's per A" and then "how many inlinks per C"? That means that in one case you'd need to group by ?a and in the other, you'd need to group by ?c. However, in this case, since you've got a fixed ?a, this might be a little bit easier. To count the distinct paths (?p1,?p2) is a little bit tricky, since when you do count(distinct …), you can only have a single expression for . However, you can be sneaky by counting distinct concat(str(?p1),str(?p2)), which is a single expression, and should be unique for each ?p1 ?p2 pair. Then I think you'd be looking for a query like this:

    select ?c
           (count(distinct concat(str(?p1),str(?b),str(?p2))) as ?n_paths)
           (count(distinct ?inlink) as ?n_inlink)
    where {
      dbpedia:AFC_Ajax ?p1 ?b . ?b ?p2 ?c .
      ?inlink ?p ?c
      filter strstarts(str(?c),str(dbpedia:))
    }
    group by ?c
    

    SPARQL results

    c                                                           n_paths n_inlink
    ----------------------------------------------------------------------------
    http://dbpedia.org/resource/AFC_Ajax                        32      540
    http://dbpedia.org/resource/Category:AFC_Ajax_players       17      484
    http://dbpedia.org/resource/Category:Living_people          17      659447
    http://dbpedia.org/resource/Category:Eredivisie_players     13      2232
    http://dbpedia.org/resource/Category:Dutch_footballers      12      2141
    http://dbpedia.org/resource/Category:1994_births             6      3605
    …