Search code examples
rdplyrlongitudinal

Creating new variables with links between cases via id's (longitudinal data)


I have a dataframe with two persons in three points of time (3x "id" == 1 and 3x "id" == 2):

id <- c(1, 1, 1, 2, 2, 2)
id2 <- c(NA, NA, NA, 1, 1, 1)
x <- c(4, 5, 5, 1, 1, 1)
dat1 <- data.frame(id, id2, x)
dat1

  id id2 x
1  1  NA 4
2  1  NA 5
3  1  NA 5
4  2   1 1
5  2   1 1
6  2   1 1

Now i want to create a new variable "y" with following rule: If "id2" is not NA, "y" should be the value of "x" that occurs most often for the person with "id2" == "id". In this example data: For all points in time, the person with "id" == 2 gets a 5 in "y", because person 2 has a 1 in "id2" and 5 is the number that occurs most often for the person with "id" == 1. Since "id2" is NA for person 1, "y" will be NA aswell (there is no other person to refer to for person 1). Result is:

  id id2 x y
1  1  NA 4 NA
2  1  NA 5 NA
3  1  NA 5 NA
4  2   1 1 5
5  2   1 1 5
6  2   1 1 5

Is there a way to do this with dplyr?


Solution

  • We may find the Mode grouped by 'id', then match the 'id2' with 'id' and replace with the 'Mode' values

    library(dplyr)
    dat1 %>% 
        group_by(id) %>%
        mutate(tmp = Mode(x)) %>% 
        ungroup %>%
        mutate(y= tmp[match(id2, id)], tmp = NULL)
    

    where

    Mode <- function(x) {
      ux <- unique(x)
      ux[which.max(tabulate(match(x, ux)))]
    }