Search code examples
rdataframegroup-byaggregatetidy

Can I aggregate rows by under certain columns condition on R just in certain columns without changing the others?


Suppose you have a large df and you want a simple and fast way to get df1 from df (a large R dataframe):

df:

index  var1  var2  var3  var4
  0      2     4     8     7
  1      2     3     9     6
  2      1     5    10     8
  2      1     5     7     8
  2      2     9    33    10
  2      2     9    17    10
  3      3     6     6     9
  3      4     7    11    10

and the idea is to aggregate (by sum) the rows with the same index and the same var1 without changing the others. Notice that var4 is the same per index var2 combination.

df1: 

index  var1  var2  var3  var4
  0      2     4     8     7
  1      2     3     9     6
  2      1     5    17     8
  2      2     9    50    10
  3      3     6     6     9
  3      4     7    11    10

Solution

  • Maybe you can use aggregate() as below

    v <- aggregate(df[-(1:2)], df[1:2], function(x) sum(unique(x)))
    res <- v[order(v$index),]
    

    and thus

    > res
      index var1 var2 var3 var4
    2     0    2    4    8    7
    3     1    2    3    9    6
    1     2    1    5   17    8
    4     2    2    9   50   10
    5     3    3    6    6    9
    6     3    4    7   11   10