Computing custom histogram metrics to understand graph structure using SPARQL

I am looking to analyze the structure of a graph and one particular query I wanted to try out was to extract different combinations of subject type - edge type - object type in a graph.

This is a follow up from a couple of earlier questions of mine:

How to generate all triples that fit a particular node type or/and edge type using SPARQL query?

How to list and count the different types of node and edge entities in the graph data using SPARQL query?

For example: If there is a semantic graph with edge types(property/predicate types) as

IsCapitalOf
IsCityOf
HasPopulation etc etc etc

And if the node types are like:

Cities
Countries
Rivers
Mountains etc

Then I should get:

City->IsCapitalOf->Country 4 tuples
City->IsCityOf->Country 21 tuples
River->IsPartOf->Country 3
River->PassesThrough->City 11

and so on...

Note: No literals in object field as I want the unit subgraph pattern fitting (subjecttype edgetype objecttype)

To summarize: I think the way I'd approach this would be:

a) Compute distinct subject types in graph b) Compute distinct edge types in graph c) Compute distinct object type in graph (a/b/c have been answered in my previous questions)

Now d) Generate all possible combinations(of subject type -> edge type -> object type(NO literals) and counts (like a histogram) of such patterns

Hope the question is articulated reasonably well.|

Edit: Adding sample data [few rows from the entire dataset] It is the yago dataset which is available publicly

<Alabama>   rdf:type    <wordnet_country_108544813> .
<Abraham_Lincoln>   rdf:type    <wordnet_president_110467179> .
<Aristotle> rdf:type    <wordnet_writer_110794014> .
<Academy_Award_for_Best_Art_Direction>  rdf:type    <wordnet_award_106696483> .
<Academy_Award> rdf:type    <wordnet_award_106696483> .
<Actrius>   rdf:type    <wordnet_movie_106613686> .
<Animalia_(book)>   rdf:type    <wordnet_book_106410904> .
<Ayn_Rand>  rdf:type    <wordnet_novelist_110363573> .
<Allan_Dwan>    rdf:type    <wikicategory_American_film_directors> .
<Algeria>   rdf:type    <wordnet_country_108544813> .
<Andre_Agassi>  rdf:type    <wordnet_player_110439851> .
<Austro-Asiatic_languages>  rdf:type    <wordnet_language_106282651> .
<Afroasiatic_languages> rdf:type    <wordnet_language_106282651> .
<Andorra>   rdf:type    <wordnet_country_108544813> .
<Animal_Farm>   rdf:type    <wordnet_novelette_106368962> .
<Alaska>    rdf:type    <wordnet_country_108544813> .
<Aldous_Huxley> rdf:type    <wordnet_writer_110794014> .
<Andrei_Tarkovsky>  rdf:type    <wordnet_film_maker_110088390> .

Solution

Suppose you've got data like this:

@prefix : <http://stackoverflow.com/q/24313367/1281433/> .

:City1 a :City .
:City2 a :City .

:Country1 a :Country .
:Country2 a :Country .
:Country3 a :Country .

:River1 a :River .
:River2 a :River .
:River3 a :River .

:City1 :isCapitalOf :Country1 .

:River1 :isPartOf :Country1, :Country2 .
:River2 :isPartOf :Country2, :Country3 .

:River1 :passesThrough :City1, :City2 .
:River2 :passesThrough :City2 .

Then this query gives you the kind results you want, I think:

prefix : <http://stackoverflow.com/q/24313367/1281433/>

select ?type1 ?p ?type2 (count(distinct *) as ?count) where {
   [ a ?type1 ; ?p [ a ?type2 ] ] 
}
group by ?type1 ?p ?type2

----------------------------------------------
| type1  | p              | type2    | count |
==============================================
| :River | :passesThrough | :City    | 3     |
| :City  | :isCapitalOf   | :Country | 1     |
| :River | :isPartOf      | :Country | 4     |
----------------------------------------------

If you're not too comfortable with the [ … ] blank node syntax, it might help to see the expanded form:

SELECT  ?type1 ?p ?type2 (count(distinct *) AS ?count)
WHERE
  { _:b0 rdf:type ?type1 .
    _:b0 ?p _:b1 .
    _:b1 rdf:type ?type2
  }
GROUP BY ?type1 ?p ?type2

This only catches things that have types, though. If you want to include things that don't have rdf:types, you'd want to do

SELECT  ?type1 ?p ?type2 (count(distinct *) AS ?count) { 
    ?x ?p ?y
    optional { ?x a ?type1 }
    optional { ?y a ?type2 }
}
GROUP BY ?type1 ?p ?type2