Search code examples
javasparqljenasimilaritydbpedia

custom jena filter function on remote endpoint?


First, just so you know, I haven't had a long computer science background, and started to work with web semantic this year, so I already apologie for any unprecise/non-scientific term/bad coding style I could use in this question.

Here is my task : I want to find dbpedia resources that are the closest to some labels that I have previously extracted from some documents. To that aim, I use a custom filter function (doing a Dice coefficient calculation that returns a score between 0 and 1 for example) to calculate the similarity between the DBpedia labels and the extracted expression (I am using Jena Apache).

Ex1 : extracted : "bea systems" -> closest DBpedia label : "BAE Systems Inc.", etc.

Ex2 : extracted : "harper-collins publishing company" -> closest DBpedia labels : "Harper-Collins", "HarperCollins", "HarperCollins Publishers", etc.

My problem is that I need to execute the query on a DBpdia endpoint as the dataset is huge (memory problem), but I get an http 500 error message as my function is stored locally and I'm querying a remote access endpoint...

Exception in thread "main" HttpException: 500
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.rewrap(HttpQuery.java:414)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:358)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:295)
at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:346)
at jena.example.similar.propfunction.DistanceTest.main(DistanceTest.java:48)

Here is my query code :

Node exp = NodeFactory.createLiteral("harper-collins publishing company") ;

String queryString = "" +
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " +
"PREFIX fn: <java:jena.example.similar.propfunction.> " +
"PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> " +
"SELECT  ?company ?label ?funcRes " +
"WHERE {" +
"?company a dbpedia-owl:Company . " +
"?company rdfs:label ?label . " +
"BIND (fn:DiceCoeff(?label, "+exp+") as ?funcRes) " +
"FILTER (lang(?label) = \"en\")" +
"}" +
"ORDER BY DESC(?funcRes) " +
"LIMIT 10 " ;

Query query = QueryFactory.create(queryString) ;

// execute the query
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
try {
    ResultSet results = qexec.execSelect() ;
    ResultSetFormatter.out(System.out, results, query) ;
} finally { qexec.close() ; }

The filter function I'm using works fine, I tested it with the same kind of query (ie. using the BIND and ORDER BY) on another smaller dataset (not DBpdia) accessed locally, and it gave me the expected results.

So, is there a way to use the custom filter function on a remote endpoint or not at all? Otherwise, what are the other options to the task I'm doing? (I've read the discussion in How I can write SPARQL query that uses similarity measures in Java Code, but it doesn't seem to be the best for me)

I would appreciate any suggestions from the community :)


Solution

  • A custom function is only registered and available locally. Unless the remote service also understands the function then it won't work against the remote service either producing an error like you see or returning unbound values for the custom function.

    What you can try is using the SERVICE clause to direct only part of your query to DBPedia and run the custom filter function locally. This will probably not perform great but will allow you to use the custom filter function e.g.

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX fn: <java:jena.example.similar.propfunction.
    PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
    SELECT  ?company ?label ?funcRes
    WHERE 
    {
      SERVICE <http://dbpedia.org/sparql>
      {
        ?company a dbpedia-owl:Company . 
        ?company rdfs:label ?label .
        FILTER (lang(?x) = "en")
      }
      BIND (fn:DiceCoeff(?x, "exp") as ?funcRes)
    }
    ORDER BY DESC(?funcRes)
    LIMIT 10
    

    This query can be run locally, it first remotely queries DBPedia to find the list of companies and then calculates the custom function locally and finally uses the calculated values to sort.

    You then need to modify your code so that you run the query against a local dataset e.g.

    QueryExecution qexec = QueryExecutionFactory.create(query, DatasetFactory.createMem());
    

    The other alternative if this isn't performant enough for you is to download the DBPedia data dumps and load into a local TDB database so that you run the queries entirely locally. See Load DBpedia locally using Jena TDB? for some information on how to do this.