Search code examples
rdplyrdata-manipulation

r difference in each observation within Id


Assuming I have a dataset like this

id   time       cd4  sequence
1   -0.741958   548  1 
1   -0.246407   893  2
1    0.243669   657  3
        
2   -2.7296369  464  1
2   -2.2505131  845  2
2   -0.221766   752  3
2    0.221766   459  4
2    0.77481198 181  5
2    1.256673   434  6

I like to compute the difference in cd4 values , the earliest value with every other value , within that ID.

ID 1, has 3 observations, earliest is at time -0.741, The code should estimate the difference of cd4 values ,

sequence 1 vs sequence 2 : 893 - 548
sequence 1 vs sequence 3 : 657 - 548
sequence 2 vs sequence 4 : 657 - 893

Id 2 has 6 observations, this is 6Choose2, which is 15 different combinations

So expecting a dataset like this

id  time        cd4 seq comb    cd4_lag
1   -0.74195    548 1   1,1     0
1   -0.24640    893 2   1,2     345
1   0.243669    657 3   1,3     109
1                       2,3    -236
2   -2.72963    464 1   1,1 
2   -2.25051    845 2   1,2     381
2   -0.22176    752 3   1,3     288
2   0.221766    459 4   1,4     -5
2   0.774811    181 5   1,5     -283
2   1.256673    434 6   1,6     -30
                        2,3     -93
                        2,4     -386
                        2,5     -664
                        2,6     -411
                        3,4     -293
                        3,5     -571
                        3,6     -318
                        4,5     -278
                        4,6     -25
                        5,6      253

Thanks for any suggestions , advise.


Solution

  • Maybe something like this? I have left out the 1,1 self matches since it wasn't clear to me why those should be included but no other self matches.

    library(dplyr)
    df |>
      inner_join(df |> select(-time),
                join_by(id, sequence < sequence)) |>
      mutate(cd4_lag = cd4.y - cd4.x)
    

    Result

       id      time cd4.x sequence.x cd4.y sequence.y cd4_lag
    1   1 -0.741958   548          1   893          2     345
    2   1 -0.741958   548          1   657          3     109
    3   1 -0.246407   893          2   657          3    -236
    4   2 -2.729637   464          1   845          2     381
    5   2 -2.729637   464          1   752          3     288
    6   2 -2.729637   464          1   459          4      -5
    7   2 -2.729637   464          1   181          5    -283
    8   2 -2.729637   464          1   434          6     -30
    9   2 -2.250513   845          2   752          3     -93
    10  2 -2.250513   845          2   459          4    -386
    11  2 -2.250513   845          2   181          5    -664
    12  2 -2.250513   845          2   434          6    -411
    13  2 -0.221766   752          3   459          4    -293
    14  2 -0.221766   752          3   181          5    -571
    15  2 -0.221766   752          3   434          6    -318
    16  2  0.221766   459          4   181          5    -278
    17  2  0.221766   459          4   434          6     -25
    18  2  0.774812   181          5   434          6     253