Search code examples
sparqlwikidatasparqlwrapper

Retrieve Wikidata ID candidates based on a partial name match


I have some entities in a specific language and I am trying to retrieve the possible IDs from Wikidata that match those names.

For example, I have some German name, let's say "Ministerium für Auswärtige Angelegenheiten" and I can get the top N candidate IDs that correspond to the name like this:

SELECT ?item                                                                                                                                                                                                                                                                                                          
    WHERE                                                                                                                                                                                                                                                                                                
    {                                                                                                                                                                                                                                                                                                    
        ?item rdfs:label "Ministerium für Auswärtige Angelegenheiten"@de                                                                                                                                                                                                                                     
    }                                                                                                                                                                                                                                                                                                    
    LIMIT 2 

and this will give me 2 candidate IDs.

The issue that I have is, if I have a name that contains some inflection, then the exact match won't be in the database and nothing will be returned.

Even in the current example with the name: "Ministerium für Auswärtige Angelegenheiten", if I remove the word "für", I won't get any results returned.

Is there a way to make the search more flexible and return the closest results to the query, even if they are incorrect?

P.S. I am doing it through Python, using the SPARQLWrapper


Solution

  • Not using the WQS SPARQL service, IIANM.

    For similar usecases, using the full-text search engine might be workable. Take a look at a search query in the API Sandbox, returning some relevant results.