I'd like to discuss the first part of this Siri-like service.
Ideally, I'd like to be able to query for things like:
And get results like this:
{type:"film"}
{type:"composer"}
{type:"song"}
I care about nothing else, I find descriptions, images and general information utterly useless outside Wikipedia. I see Wikidata as a meta-data service that can provide me with the semantics of the text I search for.
Do all data structures have "types" or some kind of a property that has to do with its meaning? Is there a list of all the types? Is there a suggestions feature for entities that have double meaning like "apple"? Finally, how can I send a text query and read the "type" of the response data structure?
I know I'm not providing any code but I really can't wrap my head around Wikidata's API. I've searched everywhere and all I can't find are some crippled fetch examples and messed up Objective-C HTML parsers. I can't even get their "example query" page to work because of some error I don't understand.
Really newbie not-friendly and full of heavy terminology.
The problem with Wikidata's API is that it does not have a query interface. All it does is return information for a specific data item, if you already know the ID. We have simply not been able to build a query interface yet that is powerful enough and able to scale. There is an early beta of a SPARQL endpoint though: https://tools.wmflabs.org/ppp-sparql/.
Once that is up and running, we hope to provide easier to use services on top of this, like Magnus' WDQ http://magnusmanske.de/wordpress/?p=72.
(Edit to answer the concrete questions about the API:)
I've searched everywhere and all I can't find are some crippled fetch examples
Documentation could be nicer, but https://www.wikidata.org/wiki/Wikidata:Data_access is a good start. Also note that https://www.wikidata.org/w/api.php is self-documenting. In particular, have a look at https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities and https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities
Do all data structures have "types" or some kind of a property that has to do with its meaning?
All statements about a data item have to do with its meaning. Many have a statement about the "instance of" (P31) or "subclass of" (P279) property, which is pretty close to what you want, I suppose.
Is there a list of all the types?
No. Wikidata doesn't use a closed, pre-defined ontology to describe the world. It's a platform to describe the world collaboratively, in a machine readable way; from that, a fluid ontology emerges, which is never quite complete or consistent.
Any data item can serve as the class or suprt-class of another item. An item can be an instance or subclass of multiple classes. The relationships are quite complex.
Is there a suggestions feature for entities that have double meaning like "apple"?
There is a search interface that can list all matching data items for a given term. It's called wbsearchentities
, for instance https://www.wikidata.org/w/api.php?action=wbsearchentities&search=apple&language=en (add format=json
for machine readable JSON).
However, the ranking in the result is very naive. And without the semantic context of the original sentence, there is no way to find which word sense is meant. This is an interesting area of research called "word sense disambiguation".
Finally, how can I send a text query and read the "type" of the response data structure?
At the moment, you will have to do two API calls: one to wbsearchentities
to get the ID of the entity you are interested in, and one to wbgetentities
to get the instance-of statement for that entity. It would be nice to combine this in a single call; there's a ticket open for this: https://phabricator.wikimedia.org/T90693
As to Siri-like services: an early prototype called "wiri" by Magnus Manske has been around for a long time. It uses very simple patterns though: https://tools.wmflabs.org/magnus-toolserver/thetalkpage/
Bene* has been working on a more advanced approach for natural language question answering, see the Platypus Demo: https://projetpp.github.io/demo.html
Just yesterday, he presented a new prototype he has been developing together with Tpt, which generates SPARQL queries from natural language input: https://tools.wmflabs.org/ppp-sparql/
All of these projects are open source, and were created by enthusiastic volunteers. Look at the code and talk to them. :)