Search code examples
rdata-extraction

Replace missing value with average of that month


I've data of occupancy at a daily level, there are some null(NA) values in Occupancy Percentage.

I want to replace the occupancy percentage of that respective month for that respective ID.

I would be explaining with a sample data(mocked-up data):

Date       ID   Occupancy%
1/2/2018    1   95
2/2/2018    1   94
3/2/2018    1   94
4/2/2018    1   96
5/2/2018    1   94
6/2/2018    1   NA
7/2/2018    1   96
8/2/2018    1   94
1/2/2018    2   75
2/2/2018    2   NA
3/2/2018    2   79
4/2/2018    2   82
5/2/2018    2   NA
6/2/2018    2   76
7/2/2018    2   78
8/2/2018    2   80

I want to replace the NA's with average of that month and with respect to its ID.

I want the outcome to be

  1. 6/2/2018 1 NA should be 6/2/2018 1 94.7
  2. 1/2/2018 2 NA should be 1/2/2018 2 78.3
  3. 5/2/2018 2 NA should be 5/2/2018 2 78.3

Any suggestions on how to achieve this in r?

Or is it easy to perform in SQL, if so how can it be implemented?


Solution

  • a=with(dat,ave(Occupancy.,sub(".*?\\/","",Date),ID,FUN=function(x)mean(x,na.rm=T)))
    > transform(dat,b=replace(x<-Occupancy.,y<-is.na(x),a[y]))
           Date ID Occupancy.        b
    1  1/2/2018  1         95 95.00000
    2  2/2/2018  1         94 94.00000
    3  3/2/2018  1         94 94.00000
    4  4/2/2018  1         96 96.00000
    5  5/2/2018  1         94 94.00000
    6  6/2/2018  1         NA 94.71429
    7  7/2/2018  1         96 96.00000
    8  8/2/2018  1         94 94.00000
    9  1/2/2018  2         75 75.00000
    10 2/2/2018  2         NA 78.33333
    11 3/2/2018  2         79 79.00000
    12 4/2/2018  2         82 82.00000
    13 5/2/2018  2         NA 78.33333
    14 6/2/2018  2         76 76.00000
    15 7/2/2018  2         78 78.00000
    16 8/2/2018  2         80 80.00000