Search code examples
rdplyrmutate

How to create a new df with each "cell" mutated by a formula?


I have a dataset from which I am trying to get the relative abundances. I would like to be able to create a new df in which each "cell" has been transformed as such- new value=(original value)/(sum of column) My dataset has several hundred columns and rows.

df <- structure(list(`raw$MTmatch` = c("Abies", "Abies lasiocarpa OR Abies grandis", "Acer", "Acer OR Acer platanoides", "Agrostis mertensii OR Agrostis exarata OR Poa annua OR Agrostis scabra OR Agrostis stolonifera" ), comb_S026401.R1 = c(0L, 4713L, 0L, 0L, 0L), comb_S026404.R1 = c(0L, 485L, 0L, 0L, 0L), comb_S026406.R1 = c(0L, 5626L, 0L, 0L, 0L), comb_S026409.R1 = c(0L, 2020L, 0L, 0L, 0L), comb_S026412.R1 = c(0L, 331L, 0L, 0L, 0L), comb_S026413.R1 = c(0L, 1394L, 0L, 0L, 0L), comb_S026414.R1 = c(0L, 21L, 0L, 0L, 0L), comb_S026415.R1 = c(0L, 0L, 0L, 10L, 0L), comb_S026416.R1 = c(0L, 271L, 0L, 0L, 0L ), comb_S026419.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026421.R1 = c(0L, 4689L, 0L, 47L, 0L), comb_S026422.R1 = c(0L, 95L, 0L, 0L, 0L), comb_S026423.R1 = c(0L, 360L, 0L, 0L, 0L), comb_S026427.R1 = c(0L, 666L, 0L, 0L, 0L), comb_S026428.R1 = c(0L, 0L, 0L, 0L, 0L ), comb_S026429.R1 = c(0L, 21L, 0L, 0L, 0L), comb_S026430.R1 = c(0L, 322L, 0L, 0L, 0L), comb_S026431.R1 = c(0L, 1545L, 0L, 9L, 0L), comb_S026432.R1 = c(0L, 2878L, 0L, 0L, 0L), comb_S026433.R1 = c(0L, 908L, 0L, 0L, 0L), comb_S026434.R1 = c(0L, 0L, 0L, 0L, 0L ), comb_S026435.R1 = c(0L, 1961L, 83L, 0L, 0L), comb_S026438.R1 = c(0L, 1944L, 0L, 0L, 0L), comb_S026440.R1 = c(0L, 1955L, 0L, 0L, 0L), comb_S026444.R1 = c(0L, 3372L, 0L, 0L, 0L), comb_S026447.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026450.R1 = c(0L, 0L, 564L, 0L, 0L ), comb_S026451.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026453.R1 = c(0L, 2721L, 0L, 0L, 0L), comb_S026456.R1 = c(8L, 10357L, 0L, 0L, 0L), comb_S026457.R1 = c(0L, 89L, 0L, 0L, 0L), comb_S026458.R1 = c(0L, 36L, 0L, 0L, 0L), comb_S026461.R1 = c(0L, 650L, 0L, 0L, 0L ), comb_S026462.R1 = c(0L, 3645L, 0L, 0L, 0L), comb_S026463.R1 = c(0L, 22L, 0L, 0L, 0L), comb_S026464.R1 = c(0L, 0L, 764L, 0L, 0L ), comb_S026466.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026467.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026469.R1 = c(0L, 797L, 0L, 0L, 0L ), comb_S026470.R1 = c(0L, 30L, 0L, 0L, 0L), comb_S026471.R1 = c(0L, 6519L, 0L, 0L, 0L), comb_S026473.R1 = c(0L, 26L, 0L, 0L, 0L), comb_S026474.R1 = c(0L, 0L, 2035L, 11L, 0L), comb_S026476.R1 = c(0L, 1181L, 0L, 0L, 0L), comb_S026477.R1 = c(0L, 53L, 0L, 0L, 10L), comb_S026483.R1 = c(0L, 0L, 0L, 0L, 0L), comb_S026484.R1 = c(0L, 170L, 0L, 0L, 0L), comb_S026485.R1 = c(0L, 37L, 0L, 0L, 0L ), comb_S026488.R1 = c(0L, 73L, 0L, 0L, 0L), comb_S026489.R1 = c(0L, 25L, 0L, 17L, 0L), comb_S026490.R1 = c(0L, 19L, 0L, 0L, 0L ), comb_S026493.R1 = c(0L, 3353L, 0L, 0L, 0L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame")) 

is a small sample of my data.

I tried using various formulations of colSums. I tried flipping the rows and columns and turning it into a tibble

 rra<-t(df) rra2<-as_tibble(rra) 

and then trying to get the sum of the rows using

rra3<-rra2%>% rowwise() 
%>% mutate(re = sum(c_across(.,))) %>% ungroup() 

but this does not work because the top row is the species names (character vector). I do not want to eliminate this row. and even when I eventually figure out how to get the sum across rows, I still can't figure out how to use that sum to then calculate the relative abundance across rows.


Solution

  • This is really a simple task, I think it is better to go with base R. In this case, because of the one no-numeric variable, I go for some simple loops. Code below:

    ind <- sapply(df, is.numeric)    
    
    col.s <- numeric(NCOL(df))
    
    for (i in seq(from=1, length.out=NCOL(df))) {
      col.s[i] <- if(!ind[i]) as.numeric(NA) else sum(df[, i])
    }
    
    for (i in seq(from=1, length.out=NCOL(df))) {
      if(ind[i]) df[, i] <- df[, i] / col.s[i] 
    }