Search code examples
xmlrhttr

URL gets truncated with httr::GET vs xmlParse


I am trying to request an XML document with two different methods (xmlParse and httr::GET) and expect the response to be the same. The response I get with xmlParse is what I expect but with httr::GET my request URL gets truncated at some point.

An example:

require(httr)
require(XML)
require(rvest)

term <- "alopecia areata"
request <- paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=",term)  

#requesting URL with XML
xml_response <- xmlParse(request)

xml_response %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns, as it should

[1] "alopecia areata"        

Now for httr

httr_response <- GET(request)
httr_content <- content(httr_response)

httr_content %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns

[1] "alopecia"

What's interesting: if we check the httr_response element for the requested URL, it's correct. Only the response is wrong.

> httr_response$request$opts$url

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=alopecia areata"

> httr_response$url

[1] "http://eutils.ncbi.nlm.nih.gov/gquery?term=alopecia&retmode=xml"

So at some point my query term got truncated. If the whole request is put into a browser by hand, it behaves as expected.

Any suggestions how to resolve this would be would be greatly appreciated.


Solution

  • You can try replacing the space in your URL by a+ to prevent it from being truncated:

    httr_response <- GET(gsub(" ","+",request))
    httr_content <- content(httr_response)
    
    httr_content %>%
            xml_nodes(xpath = "//Result/Term") %>%
            xml_text 
    
    #[1] "alopecia areata"
    

    More info about spaces and URLs here