Search code examples
rdataframenestedextract

In R: How to extract a specific (e.g. last) value from a dataframe with multiple rows belonging to one person?


I have a dataframe that contains variables from a collection of follow-up visits from patients after transplantation.

For sake of simplicity the number of variables are 2: 1) the patient identification, 2) the number of days from the transplantation up until the follow-up event.

Each row is a follow-up visit. There can be multiple follow-ups for each patient in the data frame. The amount of follow-ups for each patient vary. The days after transplantation when these follow-up visits happen vary as well.

I would like to extract the number of days from the last follow up of each patient and write it in a separate column in every follow-up observation of the patient.

In the real dataset the amount of patients is around 15,000. I tried to extract the values from a nested dataframe, but it was not possible for me.

Example:

patient_ID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C")

days_tx_followup <- c(0, 5, 10, 15, 2, 4, 1, 2, 3)

df <- data.frame(patient_ID, days_tx_followup)

  patient_ID days_tx_followup
1          A                0
2          A                5
3          A               10
4          A               15
5          B                2
6          B                4
7          C                1
8          C                2
9          C                3

What I would like to have:

  patient_ID days_tx_followup last_followup
1          A                0            15
2          A                5            15
3          A               10            15
4          A               15            15
5          B                2             4
6          B                4             4
7          C                1             3
8          C                2             3
9          C                3             3

Solution

  • Thankfully dplyr has a function called last that can do just this.

    df %>%
      group_by(patient_ID) %>%
      mutate(
        last_followup = last(days_tx_followup)
      )
    #> # A tibble: 9 × 3
    #> # Groups:   patient_ID [3]
    #>   patient_ID days_tx_followup last_followup
    #>   <chr>                 <dbl>         <dbl>
    #> 1 A                         0            15
    #> 2 A                         5            15
    #> 3 A                        10            15
    #> 4 A                        15            15
    #> 5 B                         2             4
    #> 6 B                         4             4
    #> 7 C                         1             3
    #> 8 C                         2             3
    #> 9 C                         3             3
    

    Created on 2022-08-23 by the reprex package (v2.0.1)