Search code examples
rsumrowsum

Is there a way to sum specific rows of a column?


A sample picture attached The sample dataset file I have a dataset (please see the attached file), in which I wish to sum the numeric column 'tdiff' based on a specific criteria, e.g. row (1 + 2), row (3 + 4) but not row (11,12,13,14). I have tried these but no luck,

xx<- chaPe [rowSums(1:2, 3:4, 11, 12, 13, 14, 15:16),]
xx<- sum(chaPe $tdiff [c(1:2, 3:4, 11, 12, 13, 14, 15:16)],)

Basically, if you look at the Column 'xsampa', only the numeric values of 'p' and 'A' in Column 'tdiff' need to be summed.

Expected result is, for e.g., row (1 +2), i.e. (0.068 + 0.011) = 0.079. Also, how does the sum affect the values in other columns, presuming they have the same values except the column 'rn' (which is not really important).

I am new to R, thus any help will be great as I cannot figure out this. Thanks.


Solution

  • You can create a new group whenever 'p' occurs so that first 2 rows form one group, next 2 another group and row 11:14 as it is. For each group we can sum the sum_tdiff value. For other columns you can decide which values you want to keep. For example, below I keep the first values for column Filename and Place.

    library(dplyr)
    
    chaPe %>%
      group_by(grp = cumsum(xsampa == 'p')) %>%
      summarise(sum_tdiff = sum(tdiff), 
                Filename = first(Filename), 
                Place = first(Place)) -> result