Search code examples
rsparqlwikidata

Querying a vector of strings to Wikidata using WikidataQueryServiceR


Provided a vector of movies' names, I would like to know their genres querying Wikidata.

Since I am a R user, I have recently discovered WikidataQueryServiceR which has exactly the same example I was looking for:

library(WikidataQueryServiceR)
query_wikidata('SELECT DISTINCT
  ?genre ?genreLabel
WHERE {
  ?film wdt:P31 wd:Q11424.
  ?film rdfs:label "The Cabin in the Woods"@en.
  ?film wdt:P136 ?genre.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}')

## 5 rows were returned by WDQS

Unfortunately, this query uses a static text, so I would like to replace The Cabin in the Woods by a vector. In order to do, I tried with the following code:

library(WikidataQueryServiceR)

example <- "The Cabin in the Woods" # Single string for testing purposes.

query_wikidata(paste('SELECT DISTINCT ?human ?humanLabel ?sex_or_gender ?sex_or_genderLabel WHERE {
  ?human wdt:P31 wd:Q5.
  ?human rdfs:label', example, '@en.
  ?human wdt:P21 ?sex_or_gender.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?human wdt:P2561 ?name. }
}', sep = ""))

But that does not work as expected, as I get the following result:

Error in FUN(X[[i]], ...) : Bad Request (HTTP 400).

What am I doing wrong?


Solution

  • Have you tried to output your SPARQL query? —

    • There is no space after rdfs:label
    • There are no quotes around The Cabin in the Woods

    In your R code, instead of

      ?human rdfs:label', example, '@en.
    

    line 7 should be:

      ?human rdfs:label "', example, '"@en.
    

    Although query_wikidata() can accept vector of strings, I'd suggest to use SPARQL 1.1 VALUES instead, in order to avoid too many requests.

    library(WikidataQueryServiceR)
    
    example <- c("John Lennon", "Paul McCartney")
    
    values <- paste(sprintf("('%s'@en)", example), collapse=" ")
    
    query <- paste(
      'SELECT DISTINCT ?label ?human ?humanLabel ?sexLabel {
           VALUES(?label) {', values,
          '} 
           ?human wdt:P31 wd:Q5.
           ?human rdfs:label ?label.
           ?human wdt:P21 ?sex.
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
       }'
    )  
    
    query_wikidata(query)
    

    For large number of VALUES, you probably need to use the development verion of WikidataQueryServiceR: it seems that only the development version supports POST requests.