Search code examples
rstatisticsdata.tablemedian

How to calculate the median for each minute of a dataframe?


I have a data frame (~15.000 lines), like:

time                   value
01-01-2019 08:09:25    5,3
01-01-2019 08:09:26    5,5
01-01-2019 08:09:27    6,1
...
01-01-2019 08:09:58    5,1
01-01-2019 08:09:59    5,4
01-01-2019 08:10:00    6,5
01-01-2019 08:10:01    5,2
01-01-2019 08:10:02    6,2
01-01-2019 08:10:03    5,4
...

In addition, there are missing ~ 10 lines. That means sometimes there are only 59 seconds in one minute. I do not know if that plays a role.

I found this online, but it didn't work:

library(lubridate)
dd[, c('Hour', 'Minute') := .(datastrom::hour(zeit), minute(zeit))
][, .(Avg = mean(strom)), .(Hour, Minute)]

I need the median for each minute. It would be really nice, if somebody could help me!

Expected result:

time                   value
01-01-2019 08:09       5,4 (=median of all values between 08:09:00 and 08:09:59)
01-01-2019 08:10       5,8 (=median of all values between 08:10:00 and 08:10:59)
...

Solution

  • I made a reproducible example...

    library(data.table)
    dd <- read.table(text="time                   value
    '01-01-2019 08:09:25'    5,3
    '01-01-2019 08:09:26'    5,5
    '01-01-2019 08:10:00'    6,5
    '01-01-2019 08:10:01'    5,2
    '01-01-2019 08:10:02'    6,2
    '01-01-2019 08:10:03'    5,4", header =TRUE)#
    dd <- data.table(dd)
    dd[,value:=as.numeric(gsub(",",".",value))]
    dd[,day:=substr(time,1,10)]
    dd[,hour:=substr(time,12,16)]
    ## only listing the median
    dd[,list(median_value=median(value,na.rm = TRUE)),by=.(day,hour)]
    ## or an new data table
    dd_median <- dd[,list(median_value=median(value,na.rm = TRUE)),by=.(day,hour)]
    

    Just make a data.table (dd) of your data frame.