Search code examples
sparqlendpoint

Why different endpoints do not query same datasets?


I would like to query datasets such as FOAF and DBPedia. The aim is to run quite simple requests such as “Which paintings did Magritte painted ?”,”Which are the American actor who played in American movies ?” …

So I wrote my queries, and used DBpedia snorql to run them. Then, for some other reasons, I tried Live DBpedia and OpenLinks demo.openlinksw.com to discover that the results were different according to the endpoint.

Here are 2 examples :

  1. Answer with DBpedia SnorQL but neither Live DBpedia nor OpenLinks demo.openlinksw.com

    #works of Magritte PREFIX dbo: http://dbpedia.org/ontology/ PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbp: http://dbpedia.org/property/

    SELECT * WHERE { ?person a dbo:Artist . ?person foaf:surname "Magritte"@en . ?work dbo:author ?person . OPTIONAL {?work dbp:year ?year ; dbo:museum ?museum .} } ORDER BY ?year

  2. Answer with Live DBpedia but neither DBpedia SnorQL nor OpenLinks demo.openlinksw.com

#american actors from Willem Robert van Hage R tutorial

SELECT ?actor ?movie ?director ?movie_date
       WHERE {
       ?m dc:subject <http://dbpedia.org/resource/Category:American_films> .
       ?m rdfs:label ?movie .
       FILTER(LANG(?movie) = "en")
       ?m dbp:released ?movie_date .
       FILTER(DATATYPE(?movie_date) = xsd:date)
       ?m dbp:starring ?a .
       ?a rdfs:label ?actor .
       FILTER(LANG(?actor) = "en")
       ?m dbp:director ?d .
       ?d rdfs:label ?director .
       FILTER(LANG(?director) = "en")
       }
       LIMIT 1000

I thought an endpoint was simply a tool to query dataset whatever it is. So I thought that you can query DBPedia and FOAF from dbpedia, live dbpedia or openlinks demo.openlinksw.com ..

I read that actually different endpoints use different datasets but I can’t get why, as you give specific URI to reach.

Why do same query returns different results according to the SPARQL endpoint ?


Solution

  • Much like different instances of SQL DBMS (such as [in no particular order and without implication of endorsement] OpenLink Virtuoso, Oracle, MySQL, Informix, SQL Server, Sybase, DB2, PostgreSQL, Ingres, Progress OpenEdge, and many others) hold different data, different instances (read: SPARQL endpoints) of RDF RDBMS, also known as Quadstores or Triplestores (such as [in no particular order and without implication of endorsement] OpenLink Virtuoso, AllegroGraph, Stardog, Neo4J, MarkLogic, and many others) hold different data.

    You cannot query Joe's database in DBMS A through Fred's database in DBMS B -- unless someone has already told Fred's database and/or DBMS about Joe's database and/or DBMS (e.g., VDBMS functionality), or you include some information about Joe's database and/or DBMS in your query (e.g., SPARQL Federation), etc.

    (A "DBMS" is a Database Management System, such as listed above. A "database" is a collection of data, typically stored in a [large] document, which is managed by a DBMS.)

    Of particular note relative to your question --

    • FOAF is an ontology, a vocabulary, which is used to describe entities.
    • DBpedia is a dataset (which has had various versions over time), and a project, and an organization, and various other things (the ambiguity of literal identifiers!).
    • OpenLink Software (not "openlinks") is a company which produces OpenLink Virtuoso, among other data-related software and services, and which provides a number of live endpoints on the web -- including the main DBpedia endpoint. (ObDisclaimer: OpenLink Software is also my employer.)