Search code examples
linked-data

Practical usage for linked data


I've been reading about linked data and I think I understand the basics of publishing linked data, but I'm trying to find real world practical (and best practise) usage for linked data. Many books and online tutorials talk a lot about RDF and SPARQL but not about dealing with other peoples data.

My question is, if I have a project with a bunch of data that I output as RDF, what is the best way to enhance (or correctly use) other people's data?

If I create an application for animals and I want to use data from the BBC wildlife page (http://www.bbc.co.uk/nature/life/Snow_Leopard) what should I do? Crawl the BBC wildlife page, for RDF, and save the contents to my own triplestore or query the BBC with SPARQL (I'm not sure that this is actually possible with the BBC) or do I take the URI for my animal (owl:sameAs) and curl the content from the BBC website?

This also asks the question, can you programmatically add linked data? I imagine you would have to crawl the BBC wildlife page unless they provide an index of all the content.

If I wanted to add extra information such as location for these animals (http://www.geonames.org/2950159/berlin.html) again what is considered the best approach? owl:habitat (fake predicate) Brazil? and curl the RDF for Brazil from the geonames site?

I imagine that linking to the original author is the best way because your data can then be kept up-to-date, which from these slides from a BBC presentation (http://www.slideshare.net/metade/building-linked-data-applications) is what the BBC does, but what if the authors website goes down or is too slow? And if you were to index the author's RDF I imagine your owl:sameAs would point to a local RDF.


Solution

  • Here's one potential way of creating and consuming linked data.

    1. If you are looking for an entity (i.e., a 'Resource' in Linked Data terminology) online, see if there is Linked Data description about it. One easy place to find this is DBpedia. For Snow Leopard, one URI that you can use is http://dbpedia.org/page/Snow_leopard. As you can see from the page, there are several object and property descriptions. You can use them to create a rich information platform.
    2. You can use SPARQL in two ways. Firstly, you can directly query a SPARQL endpoint on the web where there might be some data. BBC had one for music; I'm not sure if they do for other information. DBpedia can be queried using snorql. Secondly, you can retrieve the data you need from these endpoints and load into your triple store using INSERT and INSERT DATA features of SPARQL 1.1. To access the SPARQL end points from your triple store, you will need to use the SERVICE feature of SPARQL. The second approach protects you from the inability to execute your queries when a publicly available end point is down for maintenance.
    3. To programmatically add the data to your triplestore, you can use one of the predesigned libraries. In Python, RDFlib is useful for such applications.
    4. To enrich the data with that sourced from elsewhere, there can again be two approaches. The standard way of doing it is using existing vocabularies. So, you'd have to look for the habitat predicate and just insert this statement:

      dbpedia:Snow_leopard prefix:habitat geonames:Berlin .

    If no appropriate ontologies are found to contain the property (which is unlikely in this case), one needs to create a new ontology.

    1. If you want to keep your information current, then it makes sense to periodically run your queries. Using something such as DBpedia Live is useful is this regard.