Search code examples
rdata.tablezipunzipdata-extraction

Extract folder names inside of *.rar and *.zip files


I have a folder with multiple *.rar and *.zip files. Each *.rar and *.zip files have one folder and inside this folder have multiples folders.

I would like to generate a dataset with the names of these multiple folders.

How can I do this using R?

I trying:

temp <- list.files(pattern = "\\.zip$")
lapply(temp, function(x) unzip(x, list = T))

But it returns:

enter image description here

I would like to get just the names: "Nova pasta1" and Nova pasta2"

Thanks


Solution

  • Let's create an simple set of directories/files that are representative of your own. You described having a single .zip file that contains multiple zipped directories, which may contain unzipped files and/or sub-directoris.

    # Example main directory
    dir.create("main_dir")
    
    # Example directory with 1 file and a subdirectory with 1 file
    dir.create("main_dir/example_dir1")
    write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_file.csv")
    dir.create("main_dir/example_dir1/example_subdir")
    write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_subdir/example_subdirfile.csv")
    
    # Example directory with 1 file
    dir.create("main_dir/example_dir2")
    write.csv(data.frame(x = "foo"), file = "main_dir/example_dir2/example_file2.csv")
    
    # NOTE: I was having issues with using `zip()` to zip each directory
    # then the main (top) directory, so I manually zipped them below.
    
    # Manually zip example_dir1 and example_dir2, then zip main_dir at this point.
    

    Given this structure, we can get the paths to all of the directories within the highest level directory (main_dir) using unzip(list = TRUE) since we know the name of the single zipped directory containing all of these additional zipped sub-directories.

    # Unzip the highest level directory available, get all of the .zip dirs within
    ex_path <- "main_dir"
    all_zips <- unzip(zipfile = paste0(ex_path, ".zip"), list = TRUE)
    all_zips
    
    # We can remove the main_path string if we want so that we only
    # the zip files within our main directory instead of the full path.
    library(dplyr)
    
    all_zips %>%
      filter(Name != paste0(ex_path, "/")) %>%
      mutate(Name = sub(paste0(ex_path, "/"), "", Name))
    

    If you had multiple zipped directories with nested directories similar to main_dir, you could just put their paths in a list and apply the function to each element of the list. Below I reproduce this.

    # Example of multiple zip directory paths in a list
    ziplist  <- list(ex_path, ex_path, ex_path)
    
    lapply(ziplist, function(x) {
      temp <- unzip(zipfile = paste0(x, ".zip"), list = TRUE)
      temp <- temp %>% mutate(main_path = x)
      temp <- temp %>% 
               filter(Name != paste0(ex_path, "/")) %>%
               mutate(Name = sub(paste0(ex_path, "/"), "", Name))
      temp
    })
    

    If all of the .zip files in the current working directory are files you want to do this for, you can get ziplist above via:

    list.files(pattern = ".zip") %>% as.list()