Search code examples
rcut

r cut dataframe by factors


Let's say I have this

+-------+-----+------+
| Month | Day | Hour |
+-------+-----+------+
|     1 |   1 |    1 |
|     1 |   1 |    2 |
|     1 |   1 |    3 |
|     1 |   1 |    4 |
|     1 |   2 |    1 |
|     1 |   2 |    2 |
|     1 |   2 |    3 |
|     1 |   2 |    4 |
|     2 |   1 |    1 |
|     2 |   1 |    2 |
|     2 |   1 |    3 |
|     2 |   1 |    4 |
+-------+-----+------+

I would like to cut by month and day factors to have this

+-------+-----+------+-------+
| Month | Day | Hour | Block |
+-------+-----+------+-------+
|     1 |   1 |    1 | [1,2] |
|     1 |   1 |    2 | [1,2] |
|     1 |   1 |    3 | [3,4] |
|     1 |   1 |    4 | [3,4] |
|     1 |   2 |    1 | [1,2] |
|     1 |   2 |    2 | [1,2] |
|     1 |   2 |    3 | [3,4] |
|     1 |   2 |    4 | [3,4] |
|     2 |   1 |    1 | [1,2] |
|     2 |   1 |    2 | [1,2] |
|     2 |   1 |    3 | [3,4] |
|     2 |   1 |    4 | [3,4] |
+-------+-----+------+-------+

I thought that maybe using by or tapply could be a way but I cannot figure how.


Solution

  • We can create a sequence for each hour of the day with cut and replace parantheticals with brackets:

    df1$Block <- cut(df1$Hour, c(1,seq(2,24, by=2)), include.lowest=TRUE)
    df1$Block <- sub("(", "[", df1$Block, fixed=T)
    df1
    #    Month Day Hour Block
    # 1      1   1    1 [1,2]
    # 2      1   1    2 [1,2]
    # 3      1   1    3 [2,4]
    # 4      1   1    4 [2,4]
    # 5      1   2    1 [1,2]
    # 6      1   2    2 [1,2]
    # 7      1   2    3 [2,4]
    # 8      1   2    4 [2,4]
    # 9      2   1    1 [1,2]
    # 10     2   1    2 [1,2]
    # 11     2   1    3 [2,4]
    # 12     2   1    4 [2,4]