Search code examples
rvcf-variant-call-format

Wildcards to read every file with the same extension in an R loop


I am trying to read some vcf files in R using a for loop. What I do is to loop through each sample ID in my sample list and then create a variable for each sample to which I assign its corresponding vcf.

I have a directory named as each sample ID, in which there is just one vcf (but the name of each vcf is different from its corresponding sample ID).

for(i in sampleList){
nam <- paste(i, '_vcf', sep="")
assign(nam, readVcf(i/*.vcf, 'hg19'))
}

The problem is that the name of each vcf is different for each sample, and is also different from the sample ID, so I am not sure which command should I use to read it. I would like to do something as *.vcf which would work, for example, in a bash script. How can I do this in R?


Solution

  • There is a function to get a list of files in a given directory.

    Let sampleList <- c(12345, 4711, 1337). Assuming your structure is something like

    O:/12345_vcf/secret1.vcf
    O:/4711_vcf/foo.vcf
    O:/1337_vcf/bar.vcf
    

    and you don't know the name of your files, but there is only one .vcf inside every directory.

    for (i in sampleList){ 
        directory <- paste0("O:/", i, "_vcf")
        filename  <- list.files(directory, ".vcf")  # if there are more than one vcf's, there are better ways
        nam       <- readVcf(paste0(directory, filename, sep="/"))
    }
    

    I don't know the meaning of hg19 so i ignored it. Please use it, if necessary.