Search code examples
rgroup-bydplyrgroupingmedian

Calculate a new median for each row by id


I know this may be very simple but I can't solve it.

I have the following df:

Input Data

df<-data.frame(id=c(1,2,3,3,3,4, 4, 4, 4, 4, 4), value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))

Desired Result

df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "700","650","700","828", "956", "971"))

This is, to calculate the median value for each row (taking into account the different ID's). In each row a new value will be added and then a new median should be calculated

Output Data

library(dplyr)
w = df %>%
group_by(id) %>%
mutate(median = median(value, na.rm =TRUE)) %>%
select (median)
df$median <- w[,2]


df<-data.frame(id=c("1","2","3","3","3","4", "4", "4", "4", "4", "4"), value = c("956", "986", "995", "995", "986", "700", "600", "995", "956", "1000", "986"), median = c("956", "986","995","995", "995", "971","971","971","971", "971", "971"))

Solution

  • You can use zoo::rollapplyr to calculate a rolling median:

    library(tidyverse);
    library(zoo);
    df %>%
        group_by(id) %>%
        mutate(
            median = rollapplyr(value, seq_along(value), median))
    ## A tibble: 11 x 3
    ## Groups:   id [4]
    #      id value median
    #   <dbl> <dbl>  <dbl>
    # 1    1.  956.   956.
    # 2    2.  986.   986.
    # 3    3.  995.   995.
    # 4    3.  995.   995.
    # 5    3.  986.   995.
    # 6    4.  700.   700.
    # 7    4.  600.   650.
    # 8    4.  995.   700.
    # 9    4.  956.   828.
    #10    4. 1000.   956.
    #11    4.  986.   971.
    

    Sample data

    df <- data.frame(
        id = c(1,2,3,3,3,4, 4, 4, 4, 4, 4), 
        value = c(956, 986, 995, 995, 986, 700, 600, 995, 956, 1000, 986))