Search code examples
rwikipediatext-miningwikipedia-apimediawiki-api

How to access Wikipedia from R?


Is there any package for R that allows querying Wikipedia (most probably using Mediawiki API) to get list of available articles relevant to such query, as well as import selected articles for text mining?


Solution

  • Use the RCurl package for retreiving info, and the XML or RJSONIO packages for parsing the response.

    If you are behind a proxy, set your options.

    opts <- list(
      proxy = "136.233.91.120", 
      proxyusername = "mydomain\\myusername", 
      proxypassword = 'whatever', 
      proxyport = 8080
    )
    

    Use the getForm function to access the API.

    search_example <- getForm(
      "http://en.wikipedia.org/w/api.php", 
      action  = "opensearch", 
      search  = "Te", 
      format  = "json",
      .opts   = opts
    )
    

    Parse the results.

    fromJSON(rawToChar(search_example))