Search code examples
rdataframedplyr

How to perform operations on pairs of rows, based on a "distinguishing" column's values


I have a data frame with two rows for each unique combination of values across multiple index columns, which are distinguished by the values in another column. For example (and here the "index" values are color and hue, and the "distinguishing" value is level:

color = rep(c("r", "g", "b"), each=4) 
hue = rep(rep(c("l", "d"), each=2), 3) 
level = rep(1:2, 6) 
val = c(2, 4, 4, 8, 1, 9, 4, 7, 3, 4, 2, 9) 
df = data.frame(color, hue, level, val) 
df
   color hue level val
1      r   l     1   2
2      r   l     2   4
3      r   d     1   4
4      r   d     2   8
5      g   l     1   1
6      g   l     2   9
7      g   d     1   4
8      g   d     2   7
9      b   l     1   3
10     b   l     2   4
11     b   d     1   2
12     b   d     2   9

In the above example, I'd like to create a new data frame containing, for each unique combination of color and hue, the difference between the vals from the "level 2" row and and the "level 1" row .

That is (created by hand):

dfDif
  color hue dif
1     r   l   2
2     r   d   4
3     g   l   8
4     g   d   3
5     b   l   1
6     b   d   7 

Ideally, I'd like to use dplyr, but that's not a requirement by any means.


Solution

  • You can try this:

    rm(list = ls())
    library(tidyverse)
    
    color = rep(c("r", "g", "b"), each=4) 
    hue = rep(rep(c("l", "d"), each=2), 3) 
    level = rep(1:2, 6) 
    val = c(2, 4, 4, 8, 1, 9, 4, 7, 3, 4, 2, 9) 
    df = data.frame(color, hue, level, val) 
    df
    
    df_t=pivot_wider(df, names_from = c("level"), values_from = c("val"))
    df_t
    
    # df_t[, 'dif'] <- df_t[, 4] - df_t[, 3]
    df_t[c("1","2","dif")] <- list(NULL, NULL, df_t[["2"]] - df_t[["1"]]) # From user thelatemail