I have a dataset with for flats with rooms as number of rooms, balcony_size as balcony size and would like to check what is the median value for each type or rooms
data_new%>%
group_by(rooms)%>%
median(balcony_size, na.rm=TRUE)
this code returns an error
Error in median.default(., balcony_size, na.rm = TRUE) :
need numeric data
balcony_size is numeric
data_new$balcony_size
[1] NA NA NA NA 3.00 2.00 2.00 5.00 NA NA NA 4.00 2.00 NA 3.00 NA NA
[18] NA 10.00 44.00 7.50 NA 62.00 29.00 12.00 8.00 NA NA 6.00 6.00 8.00 NA NA NA
[35] NA 5.00 4.00 NA 15.00 NA NA NA 8.00 NA NA NA NA 8.00 NA NA NA
[52] 6.00 8.00 5.00 10.00 NA 5.00 1.00 NA 2.00 33.00 4.00 NA 4.00 6.00 5.00 12.00 15.00
> str(data_new$balcony_size)
num [1:40099] NA NA NA NA 3 2 2 5 NA NA ...
We can use median
in mutate
if it is to create a new column
library(dplyr)
data_new%>%
group_by(rooms)%>%
mutate(Median = median(balcony_size, na.rm=TRUE))
Or if we need only summarise
d output
data_new%>%
group_by(rooms)%>%
summarise(Median = median(balcony_size, na.rm=TRUE))
Or using base R
aggregate(balcony_size ~ room, data_new, median, na.rm = TRUE, na.action = NULL)
If we directly apply median
after the group_by
, it is the entire dataset on which it is applied and median
works on vector
and not data.frame