Search code examples
rsubsetmedian

Median plot per month with #obs. on second axis per month in r-studio


I have a data.frame consisting of 2 variables with each 2.5 million obs.

str(values)
data.frame':    2529905 obs. of  2 variables:
 $ Date : Factor w/ 498 levels "1977-11","1978-06",..: 108 60 12 108 58 108 132 188 51 60     ...
$ Value: num  223000 171528 110269 426000 172436 ...
> head(values)
 Date    Value
1 2003-01 223000.0
2 1999-01 171528.0
3 1992-01 110268.6
4 2003-01 426000.0
5 1998-11 172436.5
6 2003-01 334000.0

I wanted to make a data.frame with the median per date:

library(plyr)
medianperdate = ddply(values, .(Date), summarize, median_value = median(Value))

> str(medianperdate)
'data.frame':   498 obs. of  2 variables:
 $ Date        : Factor w/ 498 levels "1977-11","1978-06",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ median_value: num  106638 84948 85084 75725 88487 ...
> head(medianperdate)
     Date median_value
1 1977-11    106638.35
2 1978-06     84947.65
3 1985-07     85083.79
4 1986-05     75724.58
5 1986-11     88487.14
6 1986-12     98697.20

But what I want, is an extra column which counts the observations per month (eg. 2003-01, the data used would be object "values"

And another extra column where I define which class house it is:

a : < 200 000 
b : < 300 000 & > 200 000
c : < 300 000 & > 2000000

I will continuetrying this but because I am already stuck for a couple of hours I will appreciate help very much!!

If it is not clear, what I can understand. The following testdataframe presents how I would like my dataframe to look like

> testdf
Year MedianValue HouseClass #Observations
1 1999-1      200000          B           501
2 1999-2      150000          A           664
3 1999-3      250000          C           555

Solution

  • Like my answer to your previous question 0

    library(data.table)
    dt <- data.table(df)
    
    
    dt2 <- dt[,list(
       medianvalue = median(value),
       obs = .N
       ),
       by = "Date"
    ]
    
    dt2[,HouseClass := "c"]
    dt2[obs < 300000,HouseClass := "b"]
    dt2[obs < 200000,HouseClass := "a"]