Search code examples
rmergedirectorynamingsubdirectory

How to merge .txt files in subfolders and name them after in the same way as the main folder using R?


I have performed an experiment under different conditions. Each of those condition has its own Folder. In each of those folders, there is a subfolder for each replicate that containts a text file called DistList.txt. This then looks like this, where the folders "C1.1", "C1.2" and so on contain the mentioned .txt files:

enter image description here

Those .txt files then look like this, but their length may vary from only one or two to several hundreds:

enter image description here

Now, I would like to merge those .txt files and create a .csv file out of it in a way that it looks like this:

C1.1  C1.2  C1.3  ...
155   223   996
169   559   999
259   623   1033
2003        2220
4421

Until now, I was able to write a script that takes together all the files and plots the single data in different columns, just as I want it. However, I would like the title of each column to be the name of the main folder I extracted the .txt file of (e.g. C1.1, C1.2, C1.3, C2.1, ...).

So far, I have this script:

fileList <- list.files(path = ".", recursive = TRUE, pattern = "DistList.txt", full.names = TRUE)

listData <- lapply(fileList, read.table)

names(listData) <- gsub("DistList.txt","",basename(fileList))

library(tidyverse)
library(reshape2)

bind_rows(listData, .id = "FileName") %>%
  group_by(FileName) %>%
  mutate(rowNum = row_number()) %>%
  dcast(rowNum~FileName, value.var = "V1") %>%
  select(-rowNum) %>%
  write.csv(file="Result.csv")

This then yields a .csv file like this, where there are just numbers as column headers and not the name I would like to have. This is an extract of the file created, where I have marked the row that should contain the titles as mentioned above (C1.1, C1.2, C1.2, ...):

enter image description here

Is there any possibility to name the columns as I have mentioned above?


Solution

  • In this case, the line :

    names(listData) <- gsub("DistList.txt","",basename(fileList))    
    

    has to be replaced by

    names(listData) <- basename(dirname(fileList))
    

    so that the the names of the subfolders are used as the headers of the single columns.