I have no problem to query Mediawiki API of French Wikipedia for strings without accents:
string <- 'chien'
string <- stringi::stri_enc_toutf8(string, is_unknown_8bit = FALSE, validate = FALSE)
apiQuery <- paste0('https://fr.wikipedia.org/w/api.php?action=query&format=xml&titles=', string)
page <- xml2::read_xml(apiQuery)
{xml_document} [1] \n \n \n \n \n <page _idx="2736914" pageid="2736914 ...
but I have problem for strings with accents:
string <- 'être'
string <- stringi::stri_enc_toutf8(string, is_unknown_8bit = FALSE, validate = FALSE)
apiQuery <- paste0('https://fr.wikipedia.org/w/api.php?action=query&format=xml&titles=', string)
page <- xml2::read_xml(apiQuery)
I receive the following error :
Error in open.connection(x, "rb") : HTTP error 400.
You need to encode the query in HTML escapes:
page <- xml2::read_xml(URLencode(apiQuery))
This changes the "ê"
to "%C3%AA"
.