Search code examples
runiquedata-manipulation

In R how do you create a column with unique values based on time and date values in multiple columns


I have a data frame with two columns, see below:

Date        Time  
2020-01-12  00:00:01  
2020-01-12  00:00:02  
2020-01-12  00:01:01  
2020-01-12  00:01:02  
2020-01-12  00:02:01  
2020-01-12  00:02:02  
2020-01-12  00:10:01  
2020-01-12  00:10:02  
2020-01-12  00:11:01  
2020-01-12  00:11:02  
2020-01-12  00:12:01  
2020-01-12  00:12:02  
2020-01-12  00:20:01  
2020-01-12  00:20:02  
2020-01-12  00:21:01  
2020-01-12  00:21:02  
2020-01-12  00:22:01  
2020-01-12  00:22:02        
2020-01-13  00:00:01  
2020-01-13  00:00:02  
2020-01-13  00:01:01  
2020-01-13  00:01:02  
2020-01-13  00:02:01  
2020-01-13  00:02:02  
2020-01-13  00:10:01  
2020-01-13  00:10:02  
2020-01-13  00:11:01  
2020-01-13  00:11:02  
2020-01-13  00:12:01  
2020-01-13  00:12:02  
2020-01-13  00:20:01  
2020-01-13  00:20:02  
2020-01-13  00:21:01  
2020-01-13  00:21:02  
2020-01-13  00:22:01  
2020-01-13  00:22:02  

I want to create a third column with unique values starting at 1 and going to x, based on values in the first two columns. Specifically, I want to focus on the minutes place in the second column first and then the day place in the first column second, so that the results looks like below:

Date        Time      Value  
2020-01-12  00:00:01  1  
2020-01-12  00:00:02  1  
2020-01-12  00:01:01  1  
2020-01-12  00:01:02  1  
2020-01-12  00:02:01  1  
2020-01-12  00:02:02  1  
2020-01-12  00:10:01  2  
2020-01-12  00:10:02  2  
2020-01-12  00:11:01  2  
2020-01-12  00:11:02  2  
2020-01-12  00:12:01  2  
2020-01-12  00:12:02  2  
2020-01-12  00:20:01  3  
2020-01-12  00:20:02  3  
2020-01-12  00:21:01  3  
2020-01-12  00:21:02  3  
2020-01-12  00:22:01  3  
2020-01-12  00:22:02  3        
2020-01-13  00:00:01  4  
2020-01-13  00:00:02  4  
2020-01-13  00:01:01  4  
2020-01-13  00:01:02  4  
2020-01-13  00:02:01  4  
2020-01-13  00:02:02  4  
2020-01-13  00:10:01  5  
2020-01-13  00:10:02  5  
2020-01-13  00:11:01  5  
2020-01-13  00:11:02  5  
2020-01-13  00:12:01  5  
2020-01-13  00:12:02  5  
2020-01-13  00:20:01  6  
2020-01-13  00:20:02  6  
2020-01-13  00:21:01  6  
2020-01-13  00:21:02  6  
2020-01-13  00:22:01  6  
2020-01-13  00:22:02  6 

Solution

  • In base R you could do:

    transform(df, Value = cumsum(c(1, diff(as.POSIXlt(paste(Date, Time)))>60)))
    
             Date     Time Value
    1  2020-01-12 00:00:01     1
    2  2020-01-12 00:00:02     1
    3  2020-01-12 00:01:01     1
    4  2020-01-12 00:01:02     1
    5  2020-01-12 00:02:01     1
    6  2020-01-12 00:02:02     1
    7  2020-01-12 00:10:01     2
    8  2020-01-12 00:10:02     2
    9  2020-01-12 00:11:01     2
    10 2020-01-12 00:11:02     2
    11 2020-01-12 00:12:01     2
    12 2020-01-12 00:12:02     2
    13 2020-01-12 00:20:01     3
    14 2020-01-12 00:20:02     3
    15 2020-01-12 00:21:01     3
    16 2020-01-12 00:21:02     3
    17 2020-01-12 00:22:01     3
    18 2020-01-12 00:22:02     3
    19 2020-01-13 00:00:01     4
    20 2020-01-13 00:00:02     4
    21 2020-01-13 00:01:01     4
    22 2020-01-13 00:01:02     4
    23 2020-01-13 00:02:01     4
    24 2020-01-13 00:02:02     4
    25 2020-01-13 00:10:01     5
    26 2020-01-13 00:10:02     5
    27 2020-01-13 00:11:01     5
    28 2020-01-13 00:11:02     5
    29 2020-01-13 00:12:01     5
    30 2020-01-13 00:12:02     5
    31 2020-01-13 00:20:01     6
    32 2020-01-13 00:20:02     6
    33 2020-01-13 00:21:01     6
    34 2020-01-13 00:21:02     6
    35 2020-01-13 00:22:01     6
    36 2020-01-13 00:22:02     6