Search code examples
sparqljenaarq

How to query multiple tables using ARQ jena?


Overview

I am using ARQ in order to query local RDFfiles. The query is applied on 5 files which are:

  • a_m.nt, description.nt, labels.nt, links.nt, literals.nt

Information is modeled as a set of triples:

  • subject predicate object

Algorithm

First I want to select specific topics from a_m.nt file. Second I want to select the labels and descriptions of the selected topics from description.nt and labels.nt. In another way, search description.nt and labels.nt for the descriptions and labels that have the same topic as the one that was extract from a_m.nt. Finally I want to extract the rest of properties from links.nt and literals.nt.


Query

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?y ?p ?o
where { 
?topic rdf:type music. 
?topic rdf:description ?x.
?topic rdf:label ?y. 
?topic ?p ?o. 
}

Command line

sparql --data a_m.nt --data description.nt --data label.nt --data links.nt --data literals.nt --query query_sparql

questions

By using this query, first I select a topic that have the type music then I select its description, label and other properties. Is that correct?


Solution

  • In your current query, it looks like you don't need all those bindings in the where clause, since you are retrieving everything anyhow with the last statement ?topic ?p ?o. You need to namespace the music variable properly and probably add DISTINCT to the select clause. So maybe rewrite the query like this:

    PREFIX : <http://example.org/>
    select DISTINCT ?topic ?p ?o
    where { 
      ?topic a :music. 
      ?topic ?p ?o. 
    }
    

    A possible result may be:

    <foo> <type> <music>
    <foo> <description> "this is foo"
    <foo> <label> "foo"
    <bar> <type> <music>
    <bar> <label> "bar"
    

    This is different from the query you have, more general. You basically get everything back that is of type music along with all the properties and values associated with them. In your query you only get results back that have some description and label (and are of type music), along with all the properties and values that are associated with them:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX : <http://example.org/>
    select ?x ?y ?p ?o
    where { 
      ?topic rdf:type :music. 
      ?topic rdf:description ?x.
      ?topic rdf:label ?y. 
      ?topic ?p ?o. 
    }
    

    Think of it as a table with ?x ?y ?p ?o being the column headers. A possible result may be:

    "this is foo" "foo" <type> <music>
    "this is foo" "foo" <description> "this is foo"
    "this is foo" "foo" <label> "foo"
    

    etc.

    Your query will depend on how your data is organised. My question is, are there any other properties in description.nt and labels.nt that you want to avoid in the results? If so, then you may want to load that data into a named graph and extract only descriptions and labels from that graph in your query. Arbitrary example:

    SELECT ?a ?b
    FROM <A>
    FROM NAMED <B>
    WHERE
    {
      ?x a <foo> .
      GRAPH <B> { ?x ?a ?b }
    }