Search code examples
rpadr

Inserting Row in Missing Hourly Data in R using padr package - weird error


I am new to R and I am having some issues with the padr package described here.

I have a hourly data set that is missing hours and I would like to insert a row to input a value for the missing data. I am trying to use the pad function and the fill_by_value function from the padr package but I am getting an error when I use the pad function.

The data called Mendo is presented as:

Date.Local    Time.Local    Sample.Measurement
2016-01-01    00:00:00                      3
2016-01-01    00:01:00                      4
2016-01-01    00:02:00                      1
2016-01-01    00:04:00                      4
2016-01-01    00:05:00                      5

I want the final data to look like:

Date.Local    Time.Local    Sample.Measurement
2016-01-01    00:00:00                      3
2016-01-01    00:01:00                      4
2016-01-01    00:02:00                      1
2016-01-01    00:03:00                    999
2016-01-01    00:04:00                      4
2016-01-01    00:05:00                      5

I am under the impression the padr packaged wants a datetime POSIXct column so I use the command

Mendo$Time.Local <- as.POSIXct(paste(Mendo$Date.Local, Mendo$Time.Local), format = '%Y-%m-%d %H:%M')

to get:

Time.Local             Sample.Measurement
2016-01-01 00:00:00                      3
2016-01-01 00:01:00                      4
2016-01-01 00:02:00                      1
2016-01-01 00:04:00                      4
2016-01-01 00:05:00                      5

Now I try to use the pad function like instruction in the link provided above. My line of code is:

Mendo_padded <- Mendo %>% pad

and I get the error:

Error in if (total_invalid == nrow(x)) { : missing value where TRUE/FALSE needed In addition: Warning message: In if (unique(nchar(x_char)) == 10) { : the condition has length > 1 and only the first element will be used

If this were to work, I would then use the command

Mendo_padded %>% fill_by_value(Sample.Measurement, value = 999)

to get all the missing hours Sample.Measurement value to be 999.

I would love feedback, suggestions or comments on what I may be doing wrong and how I can go about getting this code to work! Thank you!


Solution

  • It seems that pad can automatically detect which column is of Date / POSIXct / POSIXlt type, so you do not need to supply Mendo$Time.Local to pad. The padding will be applied on hour intervals.

    library(magrittr)
    library(padr)
    
    PM10 <- read.csv(file="../Downloads/hourly_81102_2016.csv",
                     stringsAsFactors = FALSE) # don't change the columns to factors
    Mendo <- PM10[PM10$County.Name == "Mendocino",]
    Mendo$Time.Local <-
      as.POSIXct(paste(
        Mendo$Date.Local, Mendo$Time.Local), format = '%Y-%m-%d %H:%M')
    Mendo <- Mendo[,c("Time.Local", "Sample.Measurement")]
    
    
    # remove Mendo$Time.Local
    Mendo_padded <- Mendo %>% na.omit %>%
      pad(interval = 'hour', 
          start_val = NULL, end_val = NULL, group = NULL, 
          break_above = 1)
    

    You may also consider using the column Time.GMT and Date.GMT because date and time may depend on where you (your computer) are.

    Edit: As suggested by OP, na.omit should be used before pad to avoid NA values in the Date column.