I am looking to analyze the structure of a graph and one particular query I wanted to try out was to extract different combinations of subject type - edge type - object type in a graph.
This is a follow up from a couple of earlier questions of mine:
How to generate all triples that fit a particular node type or/and edge type using SPARQL query?
For example: If there is a semantic graph with edge types(property/predicate types) as
And if the node types are like:
Then I should get:
and so on...
Note: No literals in object field as I want the unit subgraph pattern fitting (subjecttype edgetype objecttype)
To summarize: I think the way I'd approach this would be:
a) Compute distinct subject types in graph b) Compute distinct edge types in graph c) Compute distinct object type in graph (a/b/c have been answered in my previous questions)
Now d) Generate all possible combinations(of subject type -> edge type -> object type(NO literals) and counts (like a histogram) of such patterns
Hope the question is articulated reasonably well.|
Edit: Adding sample data [few rows from the entire dataset] It is the yago dataset which is available publicly
<Alabama> rdf:type <wordnet_country_108544813> .
<Abraham_Lincoln> rdf:type <wordnet_president_110467179> .
<Aristotle> rdf:type <wordnet_writer_110794014> .
<Academy_Award_for_Best_Art_Direction> rdf:type <wordnet_award_106696483> .
<Academy_Award> rdf:type <wordnet_award_106696483> .
<Actrius> rdf:type <wordnet_movie_106613686> .
<Animalia_(book)> rdf:type <wordnet_book_106410904> .
<Ayn_Rand> rdf:type <wordnet_novelist_110363573> .
<Allan_Dwan> rdf:type <wikicategory_American_film_directors> .
<Algeria> rdf:type <wordnet_country_108544813> .
<Andre_Agassi> rdf:type <wordnet_player_110439851> .
<Austro-Asiatic_languages> rdf:type <wordnet_language_106282651> .
<Afroasiatic_languages> rdf:type <wordnet_language_106282651> .
<Andorra> rdf:type <wordnet_country_108544813> .
<Animal_Farm> rdf:type <wordnet_novelette_106368962> .
<Alaska> rdf:type <wordnet_country_108544813> .
<Aldous_Huxley> rdf:type <wordnet_writer_110794014> .
<Andrei_Tarkovsky> rdf:type <wordnet_film_maker_110088390> .
Suppose you've got data like this:
@prefix : <http://stackoverflow.com/q/24313367/1281433/> .
:City1 a :City .
:City2 a :City .
:Country1 a :Country .
:Country2 a :Country .
:Country3 a :Country .
:River1 a :River .
:River2 a :River .
:River3 a :River .
:City1 :isCapitalOf :Country1 .
:River1 :isPartOf :Country1, :Country2 .
:River2 :isPartOf :Country2, :Country3 .
:River1 :passesThrough :City1, :City2 .
:River2 :passesThrough :City2 .
Then this query gives you the kind results you want, I think:
prefix : <http://stackoverflow.com/q/24313367/1281433/>
select ?type1 ?p ?type2 (count(distinct *) as ?count) where {
[ a ?type1 ; ?p [ a ?type2 ] ]
}
group by ?type1 ?p ?type2
----------------------------------------------
| type1 | p | type2 | count |
==============================================
| :River | :passesThrough | :City | 3 |
| :City | :isCapitalOf | :Country | 1 |
| :River | :isPartOf | :Country | 4 |
----------------------------------------------
If you're not too comfortable with the [ … ]
blank node syntax, it might help to see the expanded form:
SELECT ?type1 ?p ?type2 (count(distinct *) AS ?count)
WHERE
{ _:b0 rdf:type ?type1 .
_:b0 ?p _:b1 .
_:b1 rdf:type ?type2
}
GROUP BY ?type1 ?p ?type2
This only catches things that have types, though. If you want to include things that don't have rdf:type
s, you'd want to do
SELECT ?type1 ?p ?type2 (count(distinct *) AS ?count) {
?x ?p ?y
optional { ?x a ?type1 }
optional { ?y a ?type2 }
}
GROUP BY ?type1 ?p ?type2