Search code examples
rmedian

Median in R needs numeric data


I have a dataset with for flats with rooms as number of rooms, balcony_size as balcony size and would like to check what is the median value for each type or rooms

data_new%>%
  group_by(rooms)%>%
  median(balcony_size, na.rm=TRUE)

this code returns an error

Error in median.default(., balcony_size, na.rm = TRUE) : 
  need numeric data

balcony_size is numeric

data_new$balcony_size
   [1]    NA    NA    NA    NA  3.00  2.00  2.00  5.00    NA    NA    NA  4.00  2.00    NA  3.00    NA    NA
  [18]    NA 10.00 44.00  7.50    NA 62.00 29.00 12.00  8.00    NA    NA  6.00  6.00  8.00    NA    NA    NA
  [35]    NA  5.00  4.00    NA 15.00    NA    NA    NA  8.00    NA    NA    NA    NA  8.00    NA    NA    NA
  [52]  6.00  8.00  5.00 10.00    NA  5.00  1.00    NA  2.00 33.00  4.00    NA  4.00  6.00  5.00 12.00 15.00
> str(data_new$balcony_size)
 num [1:40099] NA NA NA NA 3 2 2 5 NA NA ...

Solution

  • We can use median in mutate if it is to create a new column

    library(dplyr)
    data_new%>%
        group_by(rooms)%>%
        mutate(Median = median(balcony_size, na.rm=TRUE))
    

    Or if we need only summarised output

    data_new%>%
        group_by(rooms)%>%
        summarise(Median = median(balcony_size, na.rm=TRUE))
    

    Or using base R

    aggregate(balcony_size ~ room, data_new, median, na.rm = TRUE, na.action = NULL)
    

    If we directly apply median after the group_by, it is the entire dataset on which it is applied and median works on vector and not data.frame