I've been reading about linked data and I think I understand the basics of publishing linked data, but I'm trying to find real world practical (and best practise) usage for linked data. Many books and online tutorials talk a lot about RDF and SPARQL but not about dealing with other peoples data.
My question is, if I have a project with a bunch of data that I output as RDF, what is the best way to enhance (or correctly use) other people's data?
If I create an application for animals and I want to use data from the BBC wildlife page (http://www.bbc.co.uk/nature/life/Snow_Leopard) what should I do? Crawl the BBC wildlife page, for RDF, and save the contents to my own triplestore or query the BBC with SPARQL (I'm not sure that this is actually possible with the BBC) or do I take the URI for my animal (owl:sameAs
) and curl the content from the BBC website?
This also asks the question, can you programmatically add linked data? I imagine you would have to crawl the BBC wildlife page unless they provide an index of all the content.
If I wanted to add extra information such as location for these animals (http://www.geonames.org/2950159/berlin.html) again what is considered the best approach? owl:habitat
(fake predicate) Brazil? and curl the RDF for Brazil from the geonames site?
I imagine that linking to the original author is the best way because your data can then be kept up-to-date, which from these slides from a BBC presentation (http://www.slideshare.net/metade/building-linked-data-applications) is what the BBC does, but what if the authors website goes down or is too slow? And if you were to index the author's RDF I imagine your owl:sameAs
would point to a local RDF.
Here's one potential way of creating and consuming linked data.
http://dbpedia.org/page/Snow_leopard
. As you can see from the page, there are several object and property descriptions. You can use them to create a rich information platform.snorql
. Secondly, you can retrieve the data you need from these endpoints and load into your triple store using INSERT
and INSERT DATA
features of SPARQL 1.1. To access the SPARQL end points from your triple store, you will need to use the SERVICE
feature of SPARQL. The second approach protects you from the inability to execute your queries when a publicly available end point is down for maintenance.To enrich the data with that sourced from elsewhere, there can again be two approaches. The standard way of doing it is using existing vocabularies. So, you'd have to look for the habitat
predicate and just insert this statement:
dbpedia:Snow_leopard prefix:habitat geonames:Berlin
.
If no appropriate ontologies are found to contain the property (which is unlikely in this case), one needs to create a new ontology.