Search code examples
rdataframebinning

Making bins based on interval based on column in R


I am trying to make bins based on a specific time interval and I want the bins to restart counting when the trial number changes. Here is sample data:

structure(list(Trial_Nr = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), seconds = c(1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 
3, 3.25, 3.5, 3.75)), .Names = c("Trial_Nr", "seconds"), class = "data.frame", row.names = c(NA, 
-12L))

Here is what the dataset looks like:

   Trial_Nr seconds
1         1    1.00
2         1    1.25
3         1    1.50
4         1    1.75
5         1    2.00
6         1    2.25
7         2    2.50
8         2    2.75
9         2    3.00
10        2    3.25
11        2    3.50
12        2    3.75

My goal is to make .50 second bins within each trial number with the bins starting over at the start of a new trial. Just FYI: the real dataset has many more data points and the seconds column is not in equal intervals. This is my goal:

   Trial_Nr seconds bin
1         1    1.00   1
2         1    1.25   1
3         1    1.50   2
4         1    1.75   2
5         1    2.00   3
6         1    2.25   3
7         2    2.50   1
8         2    2.75   1
9         2    3.00   2
10        2    3.25   2
11        2    3.50   3
12        2    3.75   3

I have tried the cut function and was able to cut by intervals, but I couldn't figure out how to account for the trial number. Thank you for all your help!


Solution

  • A simply tapply would do it:

    myData$bin <- unlist(tapply(myData$seconds, myData$Trial_Nr, function(x) (x-min(x)) %/% 0.5 + 1))
    
    > myData
       Trial_Nr seconds bin
    1         1    1.00   1
    2         1    1.25   1
    3         1    1.50   2
    4         1    1.75   2
    5         1    2.00   3
    6         1    2.25   3
    7         2    2.50   1
    8         2    2.75   1
    9         2    3.00   2
    10        2    3.25   2
    11        2    3.50   3
    12        2    3.75   3
    

    EDIT:

    In case the trial_Nr is not ordered, tapply might mess up the order. Then you can simply do this step by step with split:

    dat <- split(myData, myData$Trial_Nr)
    dat <- lapply(dat, function(x) {x$bin <- (x$seconds-min(x$seconds)) %/% 0.5 + 1; x})
    dat <- unsplit(dat, myData$Trial_Nr)