Search code examples
rtimeformatdirty-data

R time formatting with dirty data


I'm using R to generate a CZML file from a database.

The database has dirty data.

I need a way to make sure times are in the format "%H:%M:%S".

The data can be in the correct %H:%M:%S already or missing zeros in front of the hour, e.g 8:30:00, which is an invalid ISO 8601 and throws the CZML parsing off entirely.

It needs to always be like so 08:30:00 or 07:09:00 in the 24h format.

I have errors because it is like so 8:30:00 or 7:09:00 still in the 24h format though, I haven't checked if the minutes or seconds are incorrect too but for the moment I assume they are correct and the only problem is the hours.

For example, I have a csv file like this:

"Date","Time","TZ","Jul.Time","BirdID","Species","Sex","Age","SiteID","Latitude","Longitude"
"4-Mar-13","08:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316
"4-Mar-13","8:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316

I need to generate a CZML like so:

"point": {
        "color": { 
            "rgba": [
"2013-03-04T08:30:00Z",225,50,50,196,"2013-03-04T08:30:01Z",50,50,225,196,"2013-03-04T13:30:00Z",225,50,50,196,"2013-03-04T13:30:01Z",50,50,225,196,"2013-03-04T16:00:00Z",225,50,50,196,"2013-03-04T16:00:01Z",50,50,225,196
            ]
        },
        "pixelSize": { "number": 10 }
    }

My code is like so:

        j=1
        numVisits=nrow(visitedTimes)
        while(j<=numVisits){
            date=as.Date(visitedTimes$Date[j], format="%d-%b-%y")
            time=format(visitedTimes$Time[j], format="%H:%M:%S")
            timeOfPassage=paste0(date,"T",time,"Z")
            timeAfter=as.POSIXlt(timeOfPassage, format="%Y-%m-%dT%H:%M:%SZ")
            timeAfter$sec=timeAfter$sec+1
            timeAfter=format(timeAfter, format="%Y-%m-%dT%H:%M:%SZ")
            cat(paste0("\"",timeOfPassage,"\","))
            cat("225,50,50,196,")
            cat(paste0("\"",timeAfter,"\","))
            cat("50,50,225,196")
            if(j<numVisits){
                cat(",")
            }
            j=j+1
        }

But it doesn't produce the desired output because of the dirty data.. Any ideas?


Solution

  • We can use times from chron

    library(chron)
    times(v1)
    #[1] 08:30:00 08:30:00 07:09:00 07:09:00
    

    Or using base R

    format(strptime(v2, '%H:%M:%S'), '%H:%M:%S')
    #[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
    

    Using the OP's updated dataset

    df1$Time <- times(df1$Time)
    df1$Time
    #[1] 08:30:00 08:30:00
    

    Or using regex

    sub('^(.:)', '0\\1', df1$Time)
    gsub('[^:]{2}(*SKIP)(*F)|(\\d)', '0\\1', v2, perl=TRUE)
    #[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
    

    data

    v1 <- c('8:30:00', '08:30:00', '7:09:00', '7:9:00')
    v2 <- c(v1, '7:9:5', '11:10:0')
    
    
    df1 <- structure(list(Date = c("4-Mar-13", "4-Mar-13"), Time = c("08:30:00", 
    "8:30:00"), TZ = c("America/Costa_Rica", "America/Costa_Rica"
    ), Jul.Time = c(2456356.1875, 2456356.1875), BirdID = c("test2", 
    "test2"), Species = c("GREH", "GREH"), Sex = c("M", "M"), Age = c("AHY", 
    "AHY"), SiteID = c("56scr25", "56scr25"), Latitude = c(8.71191178, 
    8.71191178), Longitude = c(-82.96866316, -82.96866316)), .Names = c("Date", 
    "Time", "TZ", "Jul.Time", "BirdID", "Species", "Sex", "Age", 
    "SiteID", "Latitude", "Longitude"), class = "data.frame", row.names = c(NA, 
    -2L))