Search code examples
rtimeconditional-statementsdata-cleaning

Conditional adjustment to poorly formatted 12-hour time data


I have a timeseries of 'Detection' data. The time format uses a 12-hour clock and a preceeding 'A' or 'P' to denote AM or PM. Where the 12-hour clock has double digits (10, 11, 12) the preceeding 'A' or 'P' is not followed by a space character (the file separator). This means that times are not read correctly so the time data is combined into the AM.PM column and the Detection data shifts a column to the left. See example dataset below.

For all double-digit hours, I want to separate the A or P character from the time data and put the data in their respective columns.

I have tried doing this by conditionally inserting a space character but I haven't been able to move the data back into their proper place. I would be very grateful for any advice or solutions to this, thank you!

df <- data.frame(AM.PM  = c("P", "P", "P10:00:00.943", "P10:00:01.994"),
                 Time = c("9:59:59.862", "9:59:59.862", "[7307455B02]", "[800750CD02]"),
                 Detection = c("[800750CD02]", "[7507455B02]", "", ""))

The current output:

          AM.PM         Time    Detection
1             P  9:59:59.862 [800750CD02]
2             P  9:59:59.862 [7507455B02]
3 P10:00:00.943 [7307455B02]             
4 P10:00:01.994 [800750CD02]             

The desired output:

          AM.PM         Time    Detection
1             P   9:59:59.862 [800750CD02]
2             P   9:59:59.862 [7507455B02]
3             P  10:00:00.943 [7307455B02]             
4             P  10:00:01.994 [800750CD02]             

Solution

  • For the rows where AM.PM column has value which is more than one character you can split the column into 2 and rename the columns.

    library(dplyr)
    library(tidyr)
    
    inds <- nchar(df$AM.PM) > 1
    
    df[inds, ] <- df[inds, ]  %>%
                    extract(AM.PM, c('AM.PM', 'Time1'), '(.)(.*)') %>%
                    select(AM.PM, Time = Time1, Detection = Time)
    
    df
    
    #  AM.PM         Time    Detection
    #1     P  9:59:59.862 [800750CD02]
    #2     P  9:59:59.862 [7507455B02]
    #3     P 10:00:00.943 [7307455B02]
    #4     P 10:00:01.994 [800750CD02]
    

    extract divides the column into two, first column keeps only the first character whereas second column consists of everything else.