Search code examples
javasparqlwikidatasesame

How to access the Wikidata SPARQL interface from Java?


I am trying to query all instances of an entity from Wikidata. I found out that currently the only way to do this is to use the SPARQL endpoint.

I found an example query which does about what I want to do and successfully executed it from the Web interface. Unfortunately I can't seem to be able to execute it from within my Java code. I am using the openRDF SPARQL library. Here is my relevant code:

SPARQLRepository sparqlRepository = new SPARQLRepository(
        "https://query.wikidata.org/");
SPARQLConnection sparqlConnection = new SPARQLConnection(
        sparqlRepository);

String query = "SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount) WHERE {"
        + "?s wdt:P31 wd:Q571 ."
        + "?sitelink schema:about ?s ."
        + "?s wdt:P50 ?author"
        + "OPTIONAL { ?s rdfs:label ?desc filter (lang(?desc) = \"en\"). }"
        + "OPTIONAL {"
        + "?author rdfs:label ?authorlabel filter (lang(?authorlabel) = \"en\")."
        + "}"
        + "} GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount)";

TupleQuery tupleQuery = sparqlConnection.prepareTupleQuery(
        QueryLanguage.SPARQL, query);
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());

And here is the response I'm receiving:

Exception in thread "main" org.openrdf.query.QueryEvaluationException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
    at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
    at main.Test.main(Test.java:72)
Caused by: org.openrdf.repository.RepositoryException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
    at org.openrdf.http.client.HTTPClient.handleHTTPError(HTTPClient.java:953)
    at org.openrdf.http.client.HTTPClient.sendTupleQueryViaHttp(HTTPClient.java:718)
    at org.openrdf.http.client.HTTPClient.getBackgroundTupleQueryResult(HTTPClient.java:602)
    at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:367)
    at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:52)
    ... 1 more

Normally I would assume that this means I need an API key of sorts, but the Wikidata API appears to be completely open. Did I make a mistake setting up my connection?


Solution

  • The proper endpoint URL for Wikidata is https://query.wikidata.org/sparql - you're missing the last bit.

    In addition, I noticed a few glitches in your code. First of all, you're doing this:

    SPARQLConnection sparqlConnection = new SPARQLConnection(sparqlRepository);
    

    This should be this:

    RepositoryConnection sparqlConnection = sparqlRepository.getConnection();
    

    Always retrieve your connection object from the Repository object using getConnection() - this means resources are shared and the Repository can close 'dangling' connections if necessary.

    Secondly: you can't print out the result of a query like this:

    System.out.println("Result for tupleQuery" + tupleQuery.evaluate());
    

    If you wish to print out the result to System.out you should instead do something like this:

    tupleQuery.evaluate(new SPARQLResultsTSVWriter(System.out));
    

    Or (if you wish to customize the result a bit more):

    for (BindingSet bs : QueryResults.asList(tupleQuery.evaluate())) {
        System.out.println(bs);
    }
    

    For what it's worth - with the above changes the query request runs, but it appears your query is too 'heavy' for Wikidata - at least I got a timeout error from the server. Try a simpler query though, and you'll see the code works.