I am new to R and I want a new data set from my dataframe
that will include a new column which represents the median
of the values in an existing column (called Total Extras) of the dataframe
. The latter consists of around 5,000 individual observations.
I am a bit confused on how to proceed with this task as the Median need to be calculated based on the following criteria: Property, Month, Year and Market
Currently, my dataframe
(let's call it mydata1) stands as follows (first 5 rows shown):
Property Date Month Year Market TotalExtras
ZIL 1-Jan-15 1 2015 UK 450.00
ZIL 1-Jan-15 1 2015 UK 125.00
ZIL 1-Feb-15 2 2015 UK 300.00
ZIL 1-Feb-16 2 2016 FR 225.00
EBA 1-Feb-15 2 2015 UK 150.00
...
I need my R codes to create a new dataframe
(let's call it mydata2) to appear like below:
Property Date Month Year Market MedianTotalExtras
ZIL 1-Jan-15 1 2015 UK 175.00
ZIL 1-Feb-15 2 2015 UK 250.00
ZIL 1-Feb-16 2 2016 FR 400.00
EBA 1-Feb-15 2 2015 UK 328.00
...
The figures above are for illustration purposes only. Basically, mydata2 is re-grouping the data based on Property, Date and Market with the column 'Median Total Extras' replacing the 'TotalExtras' column of mydata1.
Can this be done with R?
In dplyr
the general gist will be something like:
mydata1 %>%
group_by(Property, Date, Market) %>%
summarise(MedianTotalExtras = median(TotalExtras))
where group_by
arranges the cutting up of the dataset into pieces with unique Property
, Date
, Market
combos, and the summarise
+ median
calculates the median.