I have a collection of WAV-files in different sub-directories. I would like to get a count of how many WAV-files there are per project, but only from a specific sub-directory, "Files for analysis", a folder that is present within each project directory. What is the best way to go about this?
Each main directory is the name of a recording project. Inside each project directory are two sub-directories, "Files for analysis" and "Backup". Within each of these is a collection of subfolders with the recording rounds, with sub-folders for each recording device. Inside of these are numerous WAV-files. Visually, the folder structure looks like this (with many more projects and WAV-files, the names are just examples):
| Box 1 --- 1.WAV, 2.WAV, 3.WAV
| Round 1 --- | Box 2 --- 1.WAV, 2.WAV, 3.WAV
| Files for analysis -
| | Round 2 --- | Box 3 --- 1.WAV, 2.WAV, 3.WAV
| | Box 4 --- 1.WAV, 2.WAV, 3.WAV
Project 1 --
| | Box 1 --- 1.WAV, 2.WAV, 3.WAV
| | Round 1 --- | Box 2 --- 1.WAV, 2.WAV, 3.WAV
| Backup ------------
| Round 2 --- | Box 3 --- 1.WAV, 2.WAV, 3.WAV
| Box 4 --- 1.WAV, 2.WAV, 3.WAV
| Box 5 --- 1.WAV, 2.WAV, 3.WAV
| Round 1 --- | Box 6 --- 1.WAV, 2.WAV, 3.WAV
| Files for analysis -
| | Round 2 --- | Box 7 --- 1.WAV, 2.WAV, 3.WAV
| | Box 8 --- 1.WAV, 2.WAV, 3.WAV
Project 2 --
| | Box 5 --- 1.WAV, 2.WAV, 3.WAV
| | Round 1 --- | Box 6 --- 1.WAV, 2.WAV, 3.WAV
| Backup ------------
| Round 2 --- | Box 7 --- 1.WAV, 2.WAV, 3.WAV
| Box 8 --- 1.WAV, 2.WAV, 3.WAV
On my computer, an example file path to a WAV-file would look like this:
S:/sound_files/2024/R/testfolder/Project 1/Files for analysis/Round 1/Box 1/1.WAV
So far I have cobbled together a script that gives me the number of WAV-files per sub-directory ("box"), but not per project. (I'm not a programmer so apologies in advance for sub-par code!)
main <- "S:/sound_files/2024/R/testfolder"
## List all folders
dirs <- list.dirs(main, full.names = TRUE, recursive=TRUE)
## List top-level project folders
only_mains <- dirs[lengths(strsplit(dirs, "/")) == 6 ]
## get folders with "Files for analysis"
dir_files_for_analysis <- dirs[lengths(strsplit(dirs, "/")) == 7 ]
dir_files_for_analysis <- grep("Files for analysis", dir_files_for_analysis, value = TRUE)
## List all WAV-files in Files for analysis
files <- list.files(dir_files_for_analysis, pattern = ".WAV", recursive = TRUE, full.names = TRUE)
length(files) ## How many WAV-files total
## get sub-directory folders with files
dir_list <- split(files, dirname(files))
files_in_folder <- sapply(dir_list, length)
head(files_in_folder)
If I replace dirname(files)
with only_mains
the split function just splits the files by the number of project folders, irrespective of which folders the files come from. I have not been able to find a way to extract the directory path for the project directory, only the files' own directory (e.g. "Box 1"), via dirname()
.
What I get is this:
S:/sound_files/2024/R/testfolder/Project 1/Files for analysis/Round 1/Box 1
20
S:/sound_files/2024/R/testfolder/Project 1/Files for analysis/Round 2/Box 2
19
S:/sound_files/2024/R/testfolder/Project 2/Files for analysis/Round 1/Box 3
20
S:/sound_files/2024/R/testfolder/Project 2/Files for analysis/Round 2/Box 4
20
The ideal result for this script should look like this:
S:/sound_files/2024/R/testfolder/Project 1 39 files
S:/sound_files/2024/R/testfolder/Project 2 40 files
You may try the following :
main <- "S:/sound_files/2024/R/testfolder"
# Get all the parent project directory full path
all_projects <- list.dirs(main, recursive = FALSE, full.names = TRUE)
# Function to count total number of files from Files for Analysis folder
count_files_from_folder <- function(folder) {
length(list.files(paste0(folder, "/Files for Analysis/"),
pattern = ".WAV", recursive = TRUE))
}
# Count the number of files from each folder
sapply(all_projects, count_files_from_folder)
In the test structure that I set up to verify my answer it gives a named vector as output.
#Test/Project 1 Test/Project 2
# 4 7
If you wish to get dataframe as output then you may stack
it.
stack(sapply(all_projects, count_files_from_folder))[2:1]
# ind values
#1 Test/Project 1 4
#2 Test/Project 2 7