Search code examples
sparql

Count the number of children and the children of these children


Let's suppose I have a list of grandparents each one of them having children and recursively each one of them having their own children. Like a family tree.

Let's say one of the grandparents is Aerys Targaryen.

Aerys Targaryen has 3 children, Rhaegar, Viserys and Daenerys.

Rhaegar has 1 child, John Snow.

Viserys has 0 children.

Daenerys has 3 children, Drogon, Rhaegal and Viserion.

I would like to make a table showing:

1st column: Name of the grandparent

2nd column: Number of children the grandparent has(1st column)

3rd column: Name of the children

4th column: Number of children the parent has (3rd column)

5th column: Name of the children

I would like to have a result like this:

+-----------------+--------+----------+--------+-----------+
|     GPName      | PCount |  PName   | CCount |   CName   |
+-----------------+--------+----------+--------+-----------+
| Aerys Targaryen |      3 | Rhaegar  |      1 | John Snow |
| Aerys Targaryen |      3 | Viserys  |      0 |           |
| Aerys Targaryen |      3 | Daenerys |      3 | Drogon    |
| Aerys Targaryen |      3 | Daenerys |      3 | Rhaegal   |
| Aerys Targaryen |      3 | Daenerys |      3 | Viserion  |
+-----------------+--------+----------+--------+-----------+

I have managed to make these queries, but none is what I want.

    1.
SELECT ?GPName (count(?PName) as ?number)(group_concat(?PName;separator=",") as ?PName)
WHERE
{
     ?gp rdf:type gag:GrandParent .
     ?gp gag:name ?GPName .
     ?p gag:has_parent ?gp .
     ?p rdf:type gag:Parent .
     ?p gag:name ?PName
} group by ?GPName 

I only get the first 3 columns and not in the form that I want.

    2.
SELECT ?GPName 
       (count(?PName) as ?PCount)
       ?PName
       (count(?CName) as ?CCount)
WHERE
{
      ?gp rdf:type gag:GrandParent .
      ?gp gag:name ?GPName .
      ?p rdf:type gag:Parent .
      ?p gag:name ?PName.
      ?p gag:has_parent ?gp .
      ?c rdf:type gag:Child .
      ?c gag:name ?CName.
      ?c gag:has_parent ?p.
} GROUP BY ?GPName ?PName

It only shows columns 1,3,4 as I want them.


Solution

  • As it was pointed out, you need two separate subqueries to calculate the required aggregates for each of the solutions. Added an OPTIONAL in case there are no children for a parent and used a coalesce expression to set it to zero in the projection if that was the case.

    E.g something along the lines of the following query (I've checked it against the dataset at http://geo.linkedopendata.gr/gag-endpoint ) :

    PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    
    select ?gpName ?cnt ?PName (coalesce(?cc, 0) as ?cnt2) ?CName
    WHERE 
    {
        {
            select ?gp (count(?pe1) as ?cnt) {
                ?gp a gag:Περιφέρεια .
                ?pe1 a gag:Περιφερειακή_Ενότητα  .
                ?pe1 gag:ανήκει_σε ?gp .
            }group by ?gp 
        }
        ?gp gag:έχει_επίσημο_όνομα ?gpName .
        ?pe a gag:Περιφερειακή_Ενότητα  .
        ?pe gag:ανήκει_σε ?gp .
        ?pe gag:έχει_επίσημο_όνομα ?PName .
        optional {
            ?c rdf:type gag:Δήμος .
            ?c gag:έχει_επίσημο_όνομα ?CName.
            ?c gag:ανήκει_σε ?pe .
            {
                select ?pe (count(?c1) as ?cc) {
                    ?c1 rdf:type gag:Δήμος .
                    ?c1 gag:ανήκει_σε ?pe .
                } group by ?pe
            }
        }
    }
    

    p.s. As an alternative to the subqueries, you may create a cartesian product for the children and grandchildren and count the distinct bindings, e.g.:

    PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    select ?gpName ?PName ?CName (count(distinct ?p1) as ?cnt1) (count(distinct ?c1) as ?cnt2) 
        WHERE 
        {
            ?gp a gag:Περιφέρεια .
            ?gp gag:έχει_επίσημο_όνομα ?gpName .
            ?pe gag:ανήκει_σε ?gp .
            ?pe a gag:Περιφερειακή_Ενότητα  .
            ?pe gag:έχει_επίσημο_όνομα ?PName .
            ?p1 gag:ανήκει_σε ?gp . # cartesian product with ?pe and ?p1
            ?c gag:ανήκει_σε ?pe .
            ?c gag:έχει_επίσημο_όνομα ?CName .
            ?c1 gag:ανήκει_σε ?pe . # cartesian product with ?c and ?c1
    
        } group by ?gpName ?PName ?CName
        order by ?gpName ?PName