Search code examples
rmatrixread.tablewrite.table

Retaining the top-left column of matrices in R


I've written a general purpose script for averaging 'stacks' of matrices in a cell-wise manner. I write out the averaged file, but the cell that corresponds to the column title for the row names is discarded at some point in the process of converting between matrices/tables etc.

Is there some way to make R 'respect' this cell (top left), so that it will persist when I come to write out the file? I need to keep it for another script downstream.

I thought about just 'injecting' the cell back in at write-time, but that feels messy, and if I want this to be generalised, I'd have to add an argument to argparse. I've only been able to spot the header = T/F option of write.table so far, but that doesn't seem to offer anything additional for the top left column.

Here's the code:

# Standard install if missing
list.of.packages <- c("argparse", "abind")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
for(i in list.of.packages){suppressMessages(library(i, character.only = TRUE))}


# Parse commandline arguments
parser <- ArgumentParser()
parser$add_argument('-i',
                    '--infiles',
                    nargs='+',
                    required=TRUE,
                    help="All the matrices to average.")
parser$add_argument('-s',
                    '--separator',
                    action='store',
                    default='\t',
                    help='The field separator for the input matrices (they should all match). [Def = \t].')
parser$add_argument('-o',
                    '--outfile',
                    action='store',
                    required=TRUE,
                    help='Output file to store the averaged matrix in.')

args <-parser$parse_args()

tables <- lapply(args$infiles, read.table, header=TRUE, row.names=1, check.names=FALSE, sep=args$sep)
matrices <- lapply(tables, as.matrix)
stack <- abind(matrices, along=3)
stack_avg <- apply(stack, c(1,2), mean)
# Write file
write.table(stack_avg, args$outfile, sep=args$sep, col.names = NA, quote = FALSE)
cat("File written to: ", "\n", args$outfile, "\n")

Yielding the headings:

    Helix1  Helix2  Strand1 Strand2 Turn    Unordered
20  8   8.25    18.25   9.5 13.75   36.25
....

But the desired output is (ignore the values for now):

Temp    Helix1  Helix2  Strand1 Strand2 Turn    Unordered
20  2.00    4.00    21.00   11.00   19.00   43.00

An example input matrix might look like:

Temp    Helix1  Helix2  Strand1 Strand2 Turn    Unordered
20  2.00    12.00   19.00   11.00   11.00   23.00
25  1.00    5.00    21.00   10.00   18.00   46.00
30  1.00    4.00    21.00   10.00   17.00   45.00
35  1.00    5.00    24.00   11.00   18.00   40.00
40  1.00    5.00    21.00   100.00  19.00   43.00
45  1.00    3.00    25.00   11.00   18.00   42.00
50  1.00    4.00    23.00   11.00   19.00   41.00
55  1.00    4.00    19.00   10.00   19.00   46.00
60  1.00    5.00    18.00   11.00   22.00   42.00
65  1.00    5.00    200.00  11.00   22.00   41.00
70  2.00    4.00    20.00   11.00   20.00   43.00
75  2.00    5.00    15.00   10.00   23.00   44.00
80  2.00    5.00    16.00   10.00   22.00   45.00
85  1.00    4.00    19.00   11.00   21.00   44.00
90  2.00    4.00    20.00   11.00   20.00   44.00

Solution

  • I suspect your problem is at the read.table step. Try doing

    test_table_read <- read.table('one_of_your_tables', 
                                  header = TRUE, 
                                  row.names = 1, 
                                  check.names = FALSE,
                                  sep = '\t')
    

    and have a look at View(test_table_read). I think at this point the header for your row names will already be gone.

    Some things to consider:

    What purpose are your row names serving? Are they numeric, and if so, perhaps they should be in the data instead of being row names?

    Might this problem be better served by using data.frame instead of matrix?

    BTW, I was going to suggest giving a more minimal and reproducible example, but I think the problem does lie at the point where you read in the external data, which makes it a little more challenging to post. However, if I am wrong about that, can you reproduce the issue starting with the following set of matrices in place of your own? I do think all of the parsing arguments stuff is extraneous to the issue and could be edited out of your example.

    matrices <- lapply(split(mtcars, 1:4), as.matrix)