Search code examples
rstatisticsdata-scienceanalytics

How to calculate difference in values grouped by 2 separate variables in R


Let's say we have a team variable, but we also have a time period 1 and a time period 2 variable, and a numeric grade 1-10. I want to mutate and add a variable that calculates the difference from time period 1 to time period 2.

How do I do this?

Visually the table looks like this: img


Solution

  • There is a neat function in the data.table package called dcast( ) that allows you to transform your data from long to wide. In this case, you can use the Period variable to create 2 new columns, Period 1 and Period 2, where the values are the Grades.

    library(data.table)
    
    > data <- data.table(
    +   Team = c("Team 1","Team 1","Team 2","Team 2","Team 3","Team 3"),
    +   Period = c("Period 1","Period 2","Period 1","Period 2","Period 1","Period 2"),
    +   Grade = c(75,87,42,35,10,95))
     
    > data
         Team   Period Grade
    1: Team 1 Period 1    75
    2: Team 1 Period 2    87
    3: Team 2 Period 1    42
    4: Team 2 Period 2    35
    5: Team 3 Period 1    10
    6: Team 3 Period 2    95
     
    > data2 <- dcast(
    +   data = data,
    +   Team ~ Period,
    +   value.var = "Grade")
     
    > data2
         Team Period 1 Period 2
    1: Team 1       75       87
    2: Team 2       42       35
    3: Team 3       10       95
    
    > data2 <- data2[,Difference := `Period 2` - `Period 1`]
     
    > data2
         Team Period 1 Period 2 Difference
    1: Team 1       75       87         12
    2: Team 2       42       35         -7
    3: Team 3       10       95         85