Search code examples
rdata-manipulationdata-management

Combining count tables with different row numbers to matrix - file names are column names


I often have to import multiple text files in R and combine them to a matrix for further analysis. Usually I deal with count data from microbiological studies with items in the first column and counts in the second column. The number of rows in the samples is not the same usually.

I would like to combine those single files to one matrix, where the rownames are the items of column 1 and columns display the counts of column 2. The column names of the final table should be the file names of the original single files.

Here is a minimal example: File 1: Sample1.csv

    [1] [2]
[1]  A , 3
[2]  B , 5
[3]  C , 7
[4]  D , 2
[5]. E , 1

File2: Sample2.csv

    [1] [2]
[1]  B , 1
[2]  C , 3
[3]  F , 4
[4]  G , 7

Final wanted output

  [Sample1] [Sample2]
[A]  3       NA
[B]  5       1
[C]  7       3
[D]  2       NA
[E]  1       NA
[F]  NA      4
[G]  NA      7

I understood that in this case you would have to start loading your files in R as a list

all_paths <-list.files(path = "/path/to/directory/with/.csv", pattern = "*.csv",full.names = TRUE)

all_content <-
  all_paths %>%
  lapply(read.table,
         header = FALSE, row.names =1,
         sep = ",",
         encoding = "UTF-8")

which gives me a list of dataframes with the items as row names and counts in the first column

I struggle now to combine this list to the above shown pattern. I tried rbind.fill (dpylr) and other functions from different posts but never succeeded.

Any help is highly appreciated!


Solution

  • Don't put the items as rownames, read them as first column. Put the data in a list and do a full join with Reduce.

    all_paths <-list.files(path = "/path/to/directory/with/.csv", pattern = "*.csv",full.names = TRUE)
    all_content <-lapply(all_paths, read.table,sep = ",", encoding = "UTF-8")
    result <- Reduce(function(x, y) merge(x, y, by = 'V1', all = TRUE), all_content)