Search code examples
rdplyrmatchmeandifference

Using match on multiple criteria to generate value in R


I currently have the following data format:

df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
colnames(df) = c("station", "month", "mean")
df

df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
colnames(df_master) = c("station", "day", "month", "value")
df_master

Effectively df is a monthly average value for each station and I want to compute a new variable in the df_master data set which computes the difference from the monthly mean for each daily observation. I have managed to do this with an overall average incuding all the data, but since the mean values vary from each station so I would like to make the new variable station specific.

I have tried the following code to match the monthly value, but this currently doesn't account for cross station differences:

df_master$mean = df$mean[match(df_master$month, df$month)]
df_master = df_master %>% mutate(diff = value - mean)

How can I progress this further so that the averages are taken per station?


Solution

  • With dplyr using a left join

    library(dplyr)
    left_join(df_master, df, by = c('station', 'month')) %>% 
            mutate(monthdiff  = value - mean) %>%
            select(-mean)