I'm working on a project that tries to map DBpedia
concepts to table data columns. Specifically I wanted to map literal(numerical values; float, int..). Therefore I need adequate number of data to build a background knowledge base. I extract some data from T2D-golden-dataset
as the given format at the end of this description. Actually I should use them as a bench mark for testing and it only contains less than 20 columns from overall tables. Could anyone help me to find such a literal valued and dbpedia
annotated dataset ?
Literal valued dbpedia ranges;
"http://www.w3.org/2001/XMLSchema#float"
"http://www.w3.org/2001/XMLSchema#integer"
"http://www.w3.org/2001/XMLSchema#positiveInteger"
"http://www.w3.org/2001/XMLSchema#integer"
Some properties having these ranges;
"http://dbpedia.org/ontology/speaker",
"http://dbpedia.org/ontology/ranking",
"http://dbpedia.org/ontology/humanDevelopmentIndex",
"http://dbpedia.org/ontology/numberOfPlatformLevels",
"http://dbpedia.org/ontology/enginePower",
"http://dbpedia.org/ontology/graySubject",
"http://dbpedia.org/ontology/shareOfAudience",
"http://dbpedia.org/ontology/percentageLiteracyWomen",.........
Sample examples I need to found or somehow generate is an array corresponding to concepts given above. For an example;
"http://dbpedia.org/ontology/enginePower" : ["220", "125", "1300",....],
"http://dbpedia.org/ontology/humanDevelopmentIndex" : ["0.34", "0.78", "0.98", ...]
I don't need that exact format. It would be great If I can find enough number of data tables given as T2D golden dataset
for dbpedia
.
This query starts you down the road, as it gets you 100 typed literal values for <http://dbpedia.org/ontology/populationTotal>
, which are all typed as <http://www.w3.org/2001/XMLSchema#nonNegativeInteger>
--
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?value
WHERE
{ ?subject dbo:populationTotal ?value }
LIMIT 100
This rather more complex (and expensive) query gets you something like the end result I think you want -- but you will need to run it a number of times, for a few predicates at a time, to get everything you're asking for from the public endpoint. If needed, you could spin up your own DBpedia mirror instance in the AWS cloud, and adjust Virtuoso's timeouts and other limits, in order to build and run one query that would deliver one giant result set.
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT # DISTINCT ?predicate ?value ?value_type ?value_str
?predicate ?value_type ( GROUP_CONCAT ( DISTINCT ?value_str ; separator=", " ) AS ?values )
WHERE
{ ?subject ?predicate ?value
VALUES ( ?predicate ) { ( dbo:numberOfPlatformLevels )
( dbo:shareOfAudience )
( dbo:populationTotal )
}
BIND ( DATATYPE ( ?value ) AS ?value_type )
BIND ( STR ( ?value ) AS ?value_str )
}
GROUP BY ?predicate ?value_type
ORDER BY ?predicate ?value_type
LIMIT 1000