I need to extract information about articles (e.g., abstract, thumbnail) which located on the different nested subcategories of given category (e.g., History). How can I do that using SPARQL query? Or what is the optimal way to do that on python with a few SPARQL subqueries?
This gets all ?sc
"subcategories" that are recursively (or transitively) narrower than "History", up to a depth of 3. I implemented that with the {minDepth,maxDepth}
notation that Virtuoso understands. Other triplestores may not understand it. I have also added English-language filtering on string literals, while still retaining triples with IRIs for ?o
.
SELECT ?sc ?lab ?p ?o
WHERE {
?sc skos:broader{1,3} <http://dbpedia.org/resource/Category:History> .
optional {?sc rdfs:label ?lab } .
?sc ?p ?o
filter (lang(?lab) = "en")
filter ((lang(?o) = "en") || isURI(?o))
}
Additionally, that query reports all of the triples with ?sc
as the subject. I didn't see any abstracts (using <http://dbpedia.org/ontology/abstract>
as predicate?) or any thumbnail relationships. You can confirm that by projecting only distinct ?p
, or even counting:
SELECT ?p (count(?p) as ?pcount)
WHERE {
?sc skos:broader{1,3} <http://dbpedia.org/resource/Category:History> .
optional {?sc rdfs:label ?lab } .
?sc ?p ?o
filter (lang(?lab) = "en")
filter ((lang(?o) = "en") || isURI(?o))
}
group by ?p
order by desc(?pcount)
If you do deeper recursion, you will find some abstracts. But the deep recursion is slow and I feel like I'm conceptually missing something.
SELECT *
WHERE {
?sc skos:broader{5,7} <http://dbpedia.org/resource/Category:History> .
?sc <http://dbpedia.org/ontology/abstract> ?a
}