Search code examples
rcsvhttpauthenticationhttr

How to download a large csv file from a URL with basic HTTP authentication into a data frame with R


I am struggling to retrieve a very large (4gb) csv file protected by basic HTTP authentication using R. I have no issue receiving the response by using the following code:

library(httr)
get_resp <- GET(url, authenticate(user, pass), content_type("text/csv"))

However when I try to call:

data <- content(get_resp)

I receive an error saying that R character strings are limited to 2^3100000 bytes or whatever. I need to get the text data into a data frame for analysis. Can anyone suggest an alternative solution?


Solution

  • Seems like you have the problem describe here. The suggestion was to use the write_disk function just to download the data and not load it into R.

    Something like

    tmp <- tempfile()
    GET(url, authenticate(user, pass), content_type("text/csv"), write_disk(tmp))
    paste("Data downloaded to", tmp)
    

    Then you can do something else to read chunks into R or split the file before importing.