Search code examples
rapplysubtraction

R Group data and subtract rows across all columns, based on specific criteria


This is a subset of my data:

structure(list(First.Name = c(5006L, 5006L, 5007L, 5007L, 5008L, 
5009L), Session = c("Post", "Pre", "Post", "Pre", NA, "Post"), 
    RHR = c(65.2352941176471, 60, 62.5882352941176, 63, 63.4, 
    48.6060606060606), HRV = c(79.1470588235294, 73.5, 91.4117647058823, 
    80.5555555555556, 102.4, 146.606060606061), Hours.in.Bed = c(6.76441176470588, 
    6.325, 5.98058823529412, 4.86, 6.503, 5.43787878787879), 
    Hours.of.Sleep = c(5.88058823529412, 5.59833333333333, 4.89117647058824, 
    3.93666666666667, 5.933, 5.10484848484848), Sleep.Disturbances = c(6.85294117647059, 
    6.66666666666667, 4.52941176470588, 3.55555555555556, 5.2, 
    2.93939393939394), Latency.min = c(6.96558823529412, 3.31333333333333, 
    3.77411764705882, 2.81333333333333, 2.88, 2.90424242424242
    ), Cycles = c(5.73529411764706, 5.83333333333333, 3.23529411764706, 
    2.22222222222222, 5, 3.33333333333333), REM.Sleep.hours = c(1.42970588235294, 
    1.55, 0.466470588235294, 0.413333333333333, 1.42, 0.698181818181818
    ), Deep.Sleep.hours = c(0.612058823529412, 0.55, 1.17352941176471, 
    0.972222222222222, 0.68, 1.73909090909091), Light.Sleep.hours = c(3.83647058823529, 
    3.49666666666667, 3.25058823529412, 2.55111111111111, 3.835, 
    2.66636363636364), Awake.hours = c(0.881764705882353, 0.723333333333333, 
    1.08764705882353, 0.92, 0.568, 0.333030303030303), Missing.Data.hours = c(0, 
    0, 0, 0, 0, 0), Respiratory.Rate = c(NaN, NaN, NaN, NaN, 
    NaN, NaN), Year_Day = c(147.852941176471, 127.5, 145.117647058824, 
    129.888888888889, 130.5, 146), Week_Year = c(21.5588235294118, 
    18.6666666666667, 21.1764705882353, 19, 19.1, 21.2727272727273
    )), row.names = c(NA, -6L), groups = structure(list(First.Name = 5006:5009, 
    .rows = structure(list(1:2, 3:4, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

which looks like:

First.Name Session   RHR   HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min Cycles REM.Sleep.hours Deep.Sleep.hours Light.Sleep.hou~
       <int> <chr>   <dbl> <dbl>        <dbl>          <dbl>              <dbl>       <dbl>  <dbl>           <dbl>            <dbl>            <dbl>
1       5006 Post     65.2  79.1         6.76           5.88               6.85        6.97   5.74           1.43             0.612             3.84
2       5006 Pre      60    73.5         6.32           5.60               6.67        3.31   5.83           1.55             0.55              3.50
3       5007 Post     62.6  91.4         5.98           4.89               4.53        3.77   3.24           0.466            1.17              3.25
4       5007 Pre      63    80.6         4.86           3.94               3.56        2.81   2.22           0.413            0.972             2.55
5       5008 NA       63.4 102.          6.50           5.93               5.2         2.88   5              1.42             0.68              3.84
6       5009 Post     48.6 147.          5.44           5.10               2.94        2.90   3.33           0.698            1.74              2.67

I am trying to subtract specific rows, across all columns, based on the Session column. Specifically, Post - Pre across columns per First.Name ID. However, some IDs are missing either the Pre or Post values, or both.

So for example: The column RHR for subject 5006 will be Post - Pre or 65.2-60 and so on across the columns.

I have tried variations of

DF %>% 
  group_by(First.Name) %>% 
  summarise(RHR[Session == "Post"] - RHR[Session == "Pre"])

But I'm sure there is a way to use summarize or apply functions that don't require mutating new columns for the difference in values. Help appreciated.


Solution

  • Maybe use summarize across to get all the rows at once.

    summarize(df%>%group_by(First.Name), across(RHR:Week_Year, function(x) {return(x[1]-x[2])}))
    
      First.Name    RHR   HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min  Cycles
           <int>  <dbl> <dbl>        <dbl>          <dbl>              <dbl>       <dbl>   <dbl>
    1       5006  5.24   5.65        0.439          0.282              0.186       3.65  -0.0980
    2       5007 -0.412 10.9         1.12           0.955              0.974       0.961  1.01  
    3       5008 NA     NA          NA             NA                 NA          NA     NA     
    4       5009 NA     NA          NA             NA                 NA          NA     NA