Search code examples
rplyrlag

ddply lag with multiple subsets


I believe ddply ist the tool I need for my task and I'm having a bit of difficulty getting the correct results. I've read for a number of hours about ddply and have experimented with different codes, but I haven't gotten any further on my own. here is an example data frame

station <- c(rep("muc",13), rep("nbw", 17))
year <- c(rep(1994,4),rep(1995,4),rep(1996,5),rep(1994,5), rep(1995,4), rep(1996,4), rep(1997, 4))
depth <- c(rep(c("HUM","31-60","61-90","91-220"),2), rep(c("HUM","0-30", "31-60","61-90","91-220"),2),rep(c("HUM","0-30", "31-60","91-220"),1),rep(c("HUM","0-30", "31-60","61-90"),2))
doc <- c(80, 10, 3, 2,70, 15, 5, 5,70, 20, 5, 5, 2, 40, 10, 3, 2, 1,50, 15, 5, 2, 45, 20, 2, 1,35, 8, 2, 1)

df <-data.frame(station,year,depth,doc)
df

Depth refers to soil depth (HUM=Humus layer) and doc is the measured Dissolved Organic Carbon (doc) for soil depth. Note that not every year has measurments for the doc and some depth classes are missing. This is annoying but comes up often in my data set. With ddply I would like to add a column to to this data frame which so that for each depth, the doc of the above lying soil layer is returned and for the HUM NA should be given since nothing is on top of the Humus layer. as an example:

depth   doc  doc_m1
HUM     80   NA
31-60   10   80
61-90   3    10
91-220  2    3

In the dataframe This of course should be calculated for every year and every depth. I'd like to avoid which and for loops and it seems the ddply is suited for this, however I havn't had any luck getting a lag command to work with ddply. this is as far as I got with the code (obviously not very far):

doc <- ddply(df, .(year), transform,
      doc_m1 = ????)

Does anyone have a suggestion? Thanks in advance!


Solution

  • If your depths are already in the right order in your data set (as they are in your example), you could just do:

    doc2 <- ddply(df, .(station, year), transform,
          doc_m1 = c(NA, doc[-length(doc)]))
    

    Note I also grouped on station. This gives:

    > head(doc2, 10)
       station year  depth doc doc_m1
    1      muc 1994    HUM  80     NA
    2      muc 1994  31-60  10     80
    3      muc 1994  61-90   3     10
    4      muc 1994 91-220   2      3
    5      muc 1995    HUM  70     NA
    6      muc 1995  31-60  15     70
    7      muc 1995  61-90   5     15
    8      muc 1995 91-220   5      5
    9      muc 1996    HUM  70     NA
    10     muc 1996   0-30  20     70
    

    If they aren't already sorted by depth, make depth a factor with levels in the right order and then sort with regard to that. Then this approach should work.