I am trying to query all instances of an entity from Wikidata. I found out that currently the only way to do this is to use the SPARQL endpoint.
I found an example query which does about what I want to do and successfully executed it from the Web interface. Unfortunately I can't seem to be able to execute it from within my Java code. I am using the openRDF SPARQL library. Here is my relevant code:
SPARQLRepository sparqlRepository = new SPARQLRepository(
"https://query.wikidata.org/");
SPARQLConnection sparqlConnection = new SPARQLConnection(
sparqlRepository);
String query = "SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount) WHERE {"
+ "?s wdt:P31 wd:Q571 ."
+ "?sitelink schema:about ?s ."
+ "?s wdt:P50 ?author"
+ "OPTIONAL { ?s rdfs:label ?desc filter (lang(?desc) = \"en\"). }"
+ "OPTIONAL {"
+ "?author rdfs:label ?authorlabel filter (lang(?authorlabel) = \"en\")."
+ "}"
+ "} GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount)";
TupleQuery tupleQuery = sparqlConnection.prepareTupleQuery(
QueryLanguage.SPARQL, query);
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());
And here is the response I'm receiving:
Exception in thread "main" org.openrdf.query.QueryEvaluationException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
at main.Test.main(Test.java:72)
Caused by: org.openrdf.repository.RepositoryException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
at org.openrdf.http.client.HTTPClient.handleHTTPError(HTTPClient.java:953)
at org.openrdf.http.client.HTTPClient.sendTupleQueryViaHttp(HTTPClient.java:718)
at org.openrdf.http.client.HTTPClient.getBackgroundTupleQueryResult(HTTPClient.java:602)
at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:367)
at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:52)
... 1 more
Normally I would assume that this means I need an API key of sorts, but the Wikidata API appears to be completely open. Did I make a mistake setting up my connection?
The proper endpoint URL for Wikidata is https://query.wikidata.org/sparql
- you're missing the last bit.
In addition, I noticed a few glitches in your code. First of all, you're doing this:
SPARQLConnection sparqlConnection = new SPARQLConnection(sparqlRepository);
This should be this:
RepositoryConnection sparqlConnection = sparqlRepository.getConnection();
Always retrieve your connection object from the Repository
object using getConnection()
- this means resources are shared and the Repository
can close 'dangling' connections if necessary.
Secondly: you can't print out the result of a query like this:
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());
If you wish to print out the result to System.out
you should instead do something like this:
tupleQuery.evaluate(new SPARQLResultsTSVWriter(System.out));
Or (if you wish to customize the result a bit more):
for (BindingSet bs : QueryResults.asList(tupleQuery.evaluate())) {
System.out.println(bs);
}
For what it's worth - with the above changes the query request runs, but it appears your query is too 'heavy' for Wikidata - at least I got a timeout error from the server. Try a simpler query though, and you'll see the code works.