Search code examples
rdataframeoperation

R: Create a new column with a specific non-continuously condition in data.frame


Imagine an artificial data frame

IDtest<-c(1,1,1,1,1,1,2,2,2,3,3,3,3)
Class<-c(1,1,3,4,4,5,1,1,2,2,2,3,4)
Day<-c(0,47,76,100,150,173,0,47,76,0,47,76,100)
Area<-c(0.45,0.85,1.50,1.53,1.98,5.2,
         0.36,0.58,1.2,
         0.85,1.36,2.26,3.59)
df<-data.frame(cbind(IDtest, Class, Day, Area))
df

   IDtest Class Day Area
1       1     1   0 0.45
2       1     1  47 0.85
3       1     3  76 1.50
4       1     4 100 1.53
5       1     4 150 1.98
6       1     5 173 5.20
7       2     1   0 0.36
8       2     1  47 0.58
9       2     2  76 1.20
10      3     2   0 0.85
11      3     2  47 1.36
12      3     3  76 2.26
13      3     4 100 3.59
 I'll like to do:
 1) For IDtest 1 in Class 1: step1 = 47 - 0
 2) For IDtest 1 in Class 3: step1 = 76 - 47
 3) For IDtest 1 in Class 4: step1 = 150 - 76
 4) For IDtest 1 in Class 4: step1 = 173 - 150

up to IDtest 3.

For this a try to:

df$step1 <- NA 
for (i in 1:max(df$Class)){
  if(i == 1){
     df$step1[Class == i] <- max(df$Day[df$Class == i]) - 0 # quite silly
     }else{
     df$step1[Class == i] <- max(df$Day[df$Class == i]) - max(df$Day[df$Class == i - 1]) # "Last" as the "previous" Class, not inside the same Class
 }}

If my Class variable is continuous OK, but my Class changes the value 1 for 3. In this case, my code gives me -Inf values, because is necessary to use the last Class values (1) and not 2 that doesn't exist.

My desirable output is:

new.df

   IDtest Class Day Area step1
1       1     1   0 0.45 47
2       1     1  47 0.85 47
3       1     3  76 1.50 29
4       1     4 100 1.53 74
5       1     4 150 1.98 74
6       1     5 173 5.20 23

You see any simple modification here?


Solution

  • I am not sure if this is what you are after

    merge(df,
      within(
        aggregate(Day ~ IDtest + Class, df, max),
        step1 <- ave(Day, IDtest, FUN = function(x) diff(c(0, x)))
      ),
      by = c("IDtest", "Class"),
      all = TRUE
    )
    

    which gives

       IDtest Class Day.x Area Day.y step1
    1       1     1     0 0.45    47    47
    2       1     1    47 0.85    47    47
    3       1     3    76 1.50    76    29
    4       1     4   100 1.53   150    74
    5       1     4   150 1.98   150    74
    6       1     5   173 5.20   173    23
    7       2     1     0 0.36    47    47
    8       2     1    47 0.58    47    47
    9       2     2    76 1.20    76    29
    10      3     2     0 0.85    47    47
    11      3     2    47 1.36    47    47
    12      3     3    76 2.26    76    29
    13      3     4   100 3.59   100    24