Search code examples
runzip

unzip a tar.gz file?


I wish to download and open the following tar.gz file in R:

http://s.wordpress.org/resources/survey/wp2011-survey.tar.gz

Is there a command which can accomplish this?


Solution

  • fn <- "http://s.wordpress.org/resources/survey/wp2011-survey.tar.gz"
    download.file(fn,destfile="tmp.tar.gz")
    untar("tmp.tar.gz",list=TRUE)  ## check contents
    untar("tmp.tar.gz")
    ## or, if you just want to extract the target file:
    untar("tmp.tar.gz",files="wp2011-survey/anon-data.csv")
    X <- read.csv("wp2011-survey/anon-data.csv")
    

    Tom Wenseleers points out that the archive package can help with this:

    library(archive)
    library(readr)
    read_csv(archive_read("tmp.tar.gz", file = 3), col_types = cols())
    

    and that archive::archive_extract("tmp.tar.gz", files="wp2011-survey/anon-data.csv") is quite a bit faster than the in-built base R untar (especially for large archives) It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.