Search code examples
rstandardized

Standardize data using monthly mean and sd


I have a dataset for 50 years (which are daily values) in the following form:

Date Var1 Var2 Var3 Var4 Var5 Var6

1994-01-01 2.2 0.1 98 0 7.5 3.6

1994-01-02 4.1 3.2 70 0 2.6 5.2

1994-01-03 10.7 3.3 0 76 4.3 4.5

1994-01-04 8.5 2.3 2.6 90 0 .5 0.6

I want to standardize the data month wise i.e. using the mean and sd of each month computed from the 50 years of data and standardize every variable using the computed mean and sd. For that, first I should get the mean and sd for every month from the 50 years (i.e. in total, 12 values of mean and 12 values of sd). I am new to R and I don’t know how to compute the 50 years of average for every month in the data.frame. I used the following function to get the standardized values:

Std_data ← data.Normalization (data,type="n1",normalization="column")

However, as per I understood the above way gives the standardized values using the mean and sd of the entire column. I tried to separate the data month wise using the function "group_by" and also tried the function "subset" but I still could not get the result I want.


Solution

  • You can perform this task using package plyr.

    library(plyr)
    
    #generate data
    set.seed(1992)
    n=99
    Year <- sample(2013:2015, n, replace = TRUE, prob = NULL)
    Month <- sample(1:12, n, replace = TRUE, prob = NULL)
    V1 <- abs(rnorm(n))*100
    V2 <- abs(rnorm(n))*100
    V3 <- abs(rnorm(n))*100
    
    df <- data.frame(Year, Month, V1, V2, V3)
    
    #calculate mean and sd for each month
    avg_sd <- ddply(df, .(Month), summarize,
      V1_m = mean(V1),
      V2_m = mean(V2),
      V3_m = mean(V3),
      V1_sd = sd(V1),
      V2_sd = sd(V2),
      V3_sd = sd(V3)
      )
    
    #connect averages and sd's to data frame
    df <- merge(df,avg_sd,by="Month")
    
    
    #standatrise your variables. I used subtraction but you can use any formula you want
    df <- ddply(df,.(Year, Month, V1, V2, V3, V1_m, V2_m, V3_m), summarize,
            s_m_V1 = V1-V1_m,
            s_m_V2 = V2-V2_m,
            s_m_V3 = V3-V3_m,
            s_sd_V1 = V1-V1_sd,
            s_sd_V2 = V2-V2_sd,
            s_sd_V3 = V3-V3_sd
            )