Search code examples
rdataframemultiple-columns

How to take the row-wise maximum of columns with the same name


I have a dataframe in R with lots of columns and lots of rows with numeric values. I have multiple colums with the same name (e.g. A, A, B, B, C, C...) that should be combined into a single colum with the maximum of the rows (A,B,C)

So this

A A B B C C ...
0 2 1 3 1 1 ...
3 4 2 5 1 1 ...
... ... ... ... ... ... ...

Should be combined to

A B C ...
2 3 1 ...
4 5 1 ...
...

Can somebody help me with that?

I found an example online that is used to summarize row values of columns with the same name

df<-
data.frame(x=rpois(25,1),y=rpois(25,2),x=rpois(25,5),z=rpois(25,2),y=rpois(25,1),z=rpoi
s(25,5),check.names=FALSE)
Merged_df<-as.data.frame(do.call(cbind,
by(t(df),INDICES=names(df),FUN=colSums)))
Merged_df

and replaced FUN=colSums it with

 FUN = function(x) max(x,na.rm = TRUE)

but that resulted in an error

Error in tapply(seq_len(3635L), list(INDICES = NULL), function (x)  : 
  arguments must have same length

Now i am not sure if this is a problem in my data or in the function i was using as i am not experienced with R.


Solution

  • You can use split.default to split the columns according to their names, and then get the rowwise max with do.call and pmax, before binding the rows.

    dat <- data.frame(A = 1:2, A = c(0, 3), B = 3:4, B = 0:1, check.names = F)
    #   A A B B
    # 1 1 0 3 0
    # 2 2 3 4 1
    
    split.default(dat, colnames(dat)) |>
      lapply(do.call, what = "pmax") |>
      rbind.data.frame()
    #   A B
    # 1 1 3
    # 2 3 4