Search code examples
rdfsparqlsemantic-websesametriplestore

Sesame workbench querying an online data set


I have the following sparql query:

PREFIX ab:<http://learningsparql.com/ns/addressbook#>
SELECT ?firstName ?lastName ?streetAddress ?city ?region ?postalCode

FROM <http://www.learningsparql.com/2ndeditionexamples/ex041.ttl>

WHERE
{
?s ab:firstName ?firstName;
ab:lastname ?lastName;
ab:address ?address.

?address ab:postalCode ?postalCode;
      ab:city ?city;
      ab:streetAddress ?streetAddress;
      ab:region ?region.
}

When I run this query with Apache ARQ it works, fine. The selected variables (?firstName ?lastName etc) are populated with data from the ex041.ttl from the specified URI next to the 'FROM' clause. When I run this query from sesame workbench the variables (?firstName ?lastName etc) are empty and I don't know why. It is like sesame it's not connecting to the remote ex041.ttl file. The remote data set ex041.ttl cointains the following data:

# filename: ex041.ttl

@prefix ab: <http://learningsparql.com/ns/addressbook#> .

ab:i0432 ab:firstName    "Richard" ;
         ab:lastName      "Mutt" ;
         ab:homeTel       "(229) 276-5135" ;
    ab:email         "[email protected]" ;
    ab:address       _:b1 .

_:b1    ab:postalCode    "49345" ;
        ab:city          "Springfield" ;
        ab:streetAddress "32 Main St." ;
        ab:region        "Connecticut" .

You can access this file by typing it's URI into a browser.


Solution

  • The ARQ tool has misled you to believe that the meaning of the FROM clause in your SPARQL query is that the engine should retrieve the data from the file provided at that URL. That is non-standard behavior for SPARQL and in fact most SPARQL engines will not do this (more about this further below).

    The Sesame Workbench is a client application for a Sesame Server. A Sesame Server, in turn, is a database manager application: it allows you to create and access Sesame RDF databases (a.k.a 'repositories') via the Web (it also exposes them as SPARQL endpoints).

    To query RDF data from the Sesame Workbench using SPARQL, you need one of these conditions to be fulfilled:

    1. The RDF data you wish to query is stored in a Sesame repository on your Sesame Server, or;
    2. The RDF data you wish to query is accessible via some remote SPARQL endpoint.

    However, you can not directly query an RDF file that is downloadable from somewhere on the Web like this.

    To query this data using the Sesame Workbench, you should create a repository on your server and load the data from the file into this repository. You can then execute your query on this repository via the Workbench. Alternatively, if the RDF data is accessible via some already-existing SPARQL endpoint that you know the address of, you can query it from the Workbench with the use of a SERVICE-clause.

    Some further background on the meaning of the FROM clause: a SPARQL query, in general, is evaluated by a SPARQL engine. Normally, any SPARQL engine has a certain default dataset on which it executes the queries it receives (in the case of Sesame Workbench, that dataset is equal to the contents of the Sesame database on which you execute it).

    The FROM and FROM NAMED clauses are instructions to the SPARQL engine to query only a specific part of the (default) dataset: the values in these clauses are the identifiers for so-called named graphs, essentially subsets of the total dataset. Sesame databases are so-called quad stores, meaning that they (optionally) store a named graph identifier with every RDF statement you add (which turns the standard RDF 'triple' into a 'quad'). In a quad store such as Sesame you can use the FROM clause to restrict your query to such a subset of the total database.

    So using these clauses is like applying a 'zoom filter' to your query: you're instructing the engine to only look at a specific subset of all available data. But if the identifier of that subset is not known in the dataset you're querying, your query will not return any results.