Search code examples
rdplyraggregaterow

R: collapse rows and change value according to condition in data frame


I have a dataset:

structure(list(num = c(12L, 12L), code = structure(1:2, .Label = c("a", "b"), class = "factor"), ranking = c(2414.5, 2414.5), bottom = c(-0.0153795976572657, -0.00651997615327495), previous = c(0.00121016455715892, -0.000166609624290187), of_all_previous = c(-0.000570973882726524, -0.000771377162183913)), row.names = c(NA, -2L), class = "data.frame")

I would like to collapse the two rows according to num and ranking since they are the same but change code according to conditions where the two rows' values for columns bottom, previous, and of_all_previous should be compared sequentially (meaning: if they are equal then move on to the next column as in bottom -> previous -> of_all_previous) and choose the code that has the higher value.

In the case of the sample data provided, code b because -0.0065199761532749503 > -0.0153795976572656777 in the bottom column.

If they were equal then previous columns would have to be looked at.

I think that maybe dplyr can use the %>% to collapse rows but I cannot find how to change rows at the same time according to conditionality.

I expect the output to look something like this:

            num   code ranking 
1            12    a  2414      

Thanks in advance.


Solution

  • One approach might be to group_by both num and ranking to look at rows that have same values and need to be collapsed.

    Then, would arrange (sort) in descending order the columns where comparisons are made, including bottom previous and of_all_previous. When this is done, the higher row in the sorted data will consider the columns in order for comparison when there are ties.

    Finally, you can create a final_code with the first(code) based on the sorting.

    library(tidyverse)
    
    df %>%
      group_by(num, ranking) %>%
      arrange(-bottom, -previous, -of_all_previous) %>%
      mutate(final_code = ifelse(n() > 1, as.character(first(code)), as.character(code))) %>%
      slice(1) %>%
      select(num, code, ranking)