Search code examples
datasetontologydbpedia

Where to find a dataset with literal data already annotated with dbpedia property concepts (having their range in float or int)?


I'm working on a project that tries to map DBpedia concepts to table data columns. Specifically I wanted to map literal(numerical values; float, int..). Therefore I need adequate number of data to build a background knowledge base. I extract some data from T2D-golden-dataset as the given format at the end of this description. Actually I should use them as a bench mark for testing and it only contains less than 20 columns from overall tables. Could anyone help me to find such a literal valued and dbpedia annotated dataset ?

Literal valued dbpedia ranges;

"http://www.w3.org/2001/XMLSchema#float"
"http://www.w3.org/2001/XMLSchema#integer"
"http://www.w3.org/2001/XMLSchema#positiveInteger"
"http://www.w3.org/2001/XMLSchema#integer"

Some properties having these ranges;

"http://dbpedia.org/ontology/speaker",
"http://dbpedia.org/ontology/ranking",
"http://dbpedia.org/ontology/humanDevelopmentIndex",
"http://dbpedia.org/ontology/numberOfPlatformLevels",
"http://dbpedia.org/ontology/enginePower",
"http://dbpedia.org/ontology/graySubject",
"http://dbpedia.org/ontology/shareOfAudience",
"http://dbpedia.org/ontology/percentageLiteracyWomen",.........

Sample examples I need to found or somehow generate is an array corresponding to concepts given above. For an example;

 "http://dbpedia.org/ontology/enginePower" : ["220", "125", "1300",....],
 "http://dbpedia.org/ontology/humanDevelopmentIndex" : ["0.34", "0.78", "0.98", ...]

I don't need that exact format. It would be great If I can find enough number of data tables given as T2D golden dataset for dbpedia.


Solution

  • This query starts you down the road, as it gets you 100 typed literal values for <http://dbpedia.org/ontology/populationTotal>, which are all typed as <http://www.w3.org/2001/XMLSchema#nonNegativeInteger> --

    PREFIX  dbo:  <http://dbpedia.org/ontology/>
    
    SELECT DISTINCT ?value
    WHERE 
      { ?subject dbo:populationTotal ?value } 
    LIMIT 100
    

    This rather more complex (and expensive) query gets you something like the end result I think you want -- but you will need to run it a number of times, for a few predicates at a time, to get everything you're asking for from the public endpoint. If needed, you could spin up your own DBpedia mirror instance in the AWS cloud, and adjust Virtuoso's timeouts and other limits, in order to build and run one query that would deliver one giant result set.

    PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
    PREFIX  dbo:  <http://dbpedia.org/ontology/>
    
    SELECT # DISTINCT ?predicate ?value ?value_type ?value_str
                      ?predicate ?value_type ( GROUP_CONCAT ( DISTINCT ?value_str ; separator=", " ) AS ?values )
    WHERE 
      { ?subject  ?predicate  ?value 
        VALUES ( ?predicate ) { ( dbo:numberOfPlatformLevels )
                                ( dbo:shareOfAudience )
                                ( dbo:populationTotal ) 
                              }
          BIND ( DATATYPE ( ?value ) AS ?value_type )
          BIND (      STR ( ?value ) AS ?value_str )
      } 
    GROUP BY ?predicate ?value_type
    ORDER BY ?predicate ?value_type
    LIMIT 1000