Search code examples
rjsonhttr

R Query of a Wikimedia server


I am trying to query the Cameo database.

If I use the URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json, then I get, online, a valid output.

However, if I use:

library(httr)
library(jsonlite)

base_url <- "https://cameo.mfa.org/api.php"

query_param <- list(action  = "query",
                    pageids = "17051",
                    format = "json",
                    prop = "extracts"
)

parsed_content <- httr::GET(base_url, query_param)

jsonlite::fromJSON(content(parsed_content, as = "text", encoding = "UTF-8"))

Then jsonlite fails because the output is in html format and not json.

Do you have any advice on this?


Solution

  • The second argument to httr::GET is config=, which is not where you should be assigning query_param. Instead name it as query=query_param.

    res <- httr::GET(base_url, query = query_param)
    res
    # Response [https://cameo.mfa.org/api.php?action=query&pageids=17051&format=json&prop=extracts]
    #   Date: 2023-07-03 15:06
    #   Status: 200
    #   Content-Type: application/json; charset=utf-8
    #   Size: 5.22 kB
    str(httr::content(res))
    # List of 3
    #  $ batchcomplete: chr ""
    #  $ warnings     :List of 1
    #   ..$ extracts:List of 1
    #   .. ..$ *: chr "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are li"| __truncated__
    #  $ query        :List of 1
    #   ..$ pages:List of 1
    #   .. ..$ 17051:List of 4
    #   .. .. ..$ pageid : int 17051
    #   .. .. ..$ ns     : int 0
    #   .. .. ..$ title  : chr "Copper"
    #   .. .. ..$ extract: chr "<h2><span id=\"Description\">Description</span></h2>\n<p>A reddish-brown, ductile, metallic element. Copper is "| __truncated__