Search code examples
sparqlrdfdbpedia

The total number of classes and properties in DBpedia


Okay this seems like a really basic question, but for some reason I am not able to figure this out. I have the DBpedia 2014 owl file from here. Now when I load this in Protégé and look at the Ontology metrics tab, I see that the class count is 814, Object property count is 1310, Data property count is 1725. Is this the right number? Out of curiosity I tried to check the numbers on the Virtuoso endpoint and for the query

select ?p (count(?p) as ?totalCount) where {?s ?p ?o } group by ?p order by DESC(?totalCount)

i.e. trying to find the properties and the total number of times they appear in the graph, I find that the total is 10,000. Now I am not sure if this is the right way to check the properties and the number of times they appear in the graph.

For classes when I issue this query :

SELECT ?class 
WHERE {
   ?class rdf:type rdfs:Class.
}

I don't get any results at all. Now using the default query in Virtuoso i.e.

Select count(distinct ?Concept) where {[] a ?Concept}

I get the value as 369857. So I am a bit confused. Is this large number because of the fact that the graph has concepts from yago,umbel,schema.org and purl or am I looking at something wrongly? Are the Concepts completely different from the classes? (interpreted differently, which I haven't thought of).

Now honestly I got waylaid by these numbers because I needed them to calculate the selectivity as defined in this paper

Here the say that for a triple pattern, the selectivity of a subject is 1/R, where R is the number of resource, so do the resources mean the Class count or the Concept count? or the count of ?s in the ?s ?p ?o . triple pattern?


Solution

  • The DBpedia ontology contains just the axioms for classes and properties with the namespace http://dbpedia.org/ontology.

    The DBpedia SPARQL endpoint contains much more data:

    At first, it contains triples with properties that have the namespace http://dbpedia.org/property . Those properties are untyped (i.e. of typerdf:Property, which in fact means that the value can be both a resource or a literal. In OWL we have typed properties, i.e. object and data properties.

    Other information that are loaded into the SPARQL endpoint are, among others, links to external datasets like YAGO or the upper-level ontology UMBEL. You can find more details here [1], [2].

    By the way, you can see that easily from your first query. There are much more properties with different namespaces.

    According to your first query: It's the correct query if you want the number of triples for each property. It returns only 10000 because that's the default result set limit of the Virtuoso triple store in which DBpedia is loaded. For more results you have to use pagination. The total number of properties used in triples can be found with

    SELECT  (COUNT(DISTINCT ?p) AS ?cnt)
    WHERE
      { ?s ?p ?o}
    

    Your second query with all classes of type rdf:Class returns nothing because no class in DBpedia is of that type. It's more usual to query for classes of type owl:Class for OWL ontologies. The third query in fact returns all resources that ever occur in rdf:type triples in object position, which is slightly different as it works on instance data. That means it return all classes that are really used in the data.

    For your last question. I haven't read the paper, but a common metric in many research papers is often to use the distinct subjects that use a given property.