Search code examples
rdfsparqljenadbpediavirtuoso

why are the results returned by Virtuoso SPARQL endpoint and Jena different?


I have queried DBPedia via Virtuoso SPARQL endpoint and Jena, but the results are different. My query is :

SELECT (COUNT(DISTINCT (?v)) AS ?num)
FROM <http://dbpedia.org>
WHERE {
  ?x  <http://dbpedia.org/property/deathPlace>  ?v .
  ?v  rdf:type                                  ?t .
  FILTER STRSTARTS( STR(?t), STR("http://dbpedia.org/ontology/Place") )
}

I execute my query in Jena by this function :

    public static ArrayList<String> query(String queryStr) {
    ArrayList<String> result = new ArrayList<>();
    queryStr = SPARQL_PREFIX + queryStr;
    Query query = QueryFactory.create(queryStr);

    // Remote execution.
    try (QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query)) {
        // Set the DBpedia specific timeout.
        ((QueryEngineHTTP) qexec).addParam("timeout", "10000");

        // Execute.
        ResultSet rs = qexec.execSelect();
        while (rs.hasNext()) {
            result.add(rs.next().toString());
        }
    } catch (Exception e) {
        e.printStackTrace();
        System.err.println("============================================");
        System.err.println(queryStr);
        System.err.println("============================================");
    }
    return result;
}

I've set the graph to search in the FROM expression but the result are still different. When I execute the query on Virtuoso's SPARQL endpoint, the result is 21482, but the result returned by Jena is 9586.

Is there any idea?


Solution

  • As AKSW and Taylor mentioned in the comments, DBPedia has different limits on remote queries than on queries launched from its website. In this case, string matching (which is an expensive operation) makes the query more time consuming, and the result returned by jena is only part of the actual result of the query.

    To solve this we can directly use URI instead of its string :

    SELECT  (COUNT(DISTINCT (?v)) AS ?num)
      FROM  <http://dbpedia.org>
     WHERE 
       {
         ?x  <http://dbpedia.org/property/deathPlace>  ?v   .
         ?v  rdf:type  <http://dbpedia.org/ontology/Place>  .
       }