Search code examples
rzipconnection

Using R to download gzipped data file, extract, and import data


A follow up to this question: How can I download and uncompress a gzipped file using R? For example (from the UCI Machine Learning Repository), I have a file of insurance data. How can I download it using R?

Here is the data url: http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz.


Solution

  • I like Ramnath's approach, but I would use temp files like so:

    tmpdir <- tempdir()
    
    url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
    file <- basename(url)
    download.file(url, file)
    
    untar(file, compressed = 'gzip', exdir = tmpdir )
    list.files(tmpdir)
    

    The list.files() should produce something like this:

    [1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt" 
    

    which you could parse if you needed to automate this process for a lot of files.