I have a dataset for 50 years (which are daily values) in the following form:
Date Var1 Var2 Var3 Var4 Var5 Var6
1994-01-01 2.2 0.1 98 0 7.5 3.6
1994-01-02 4.1 3.2 70 0 2.6 5.2
1994-01-03 10.7 3.3 0 76 4.3 4.5
1994-01-04 8.5 2.3 2.6 90 0 .5 0.6
I want to standardize the data month wise i.e. using the mean and sd of each month computed from the 50 years of data and standardize every variable using the computed mean and sd. For that, first I should get the mean and sd for every month from the 50 years (i.e. in total, 12 values of mean and 12 values of sd). I am new to R and I don’t know how to compute the 50 years of average for every month in the data.frame. I used the following function to get the standardized values:
Std_data ← data.Normalization (data,type="n1",normalization="column")
However, as per I understood the above way gives the standardized values using the mean and sd of the entire column. I tried to separate the data month wise using the function "group_by" and also tried the function "subset" but I still could not get the result I want.
You can perform this task using package plyr
.
library(plyr)
#generate data
set.seed(1992)
n=99
Year <- sample(2013:2015, n, replace = TRUE, prob = NULL)
Month <- sample(1:12, n, replace = TRUE, prob = NULL)
V1 <- abs(rnorm(n))*100
V2 <- abs(rnorm(n))*100
V3 <- abs(rnorm(n))*100
df <- data.frame(Year, Month, V1, V2, V3)
#calculate mean and sd for each month
avg_sd <- ddply(df, .(Month), summarize,
V1_m = mean(V1),
V2_m = mean(V2),
V3_m = mean(V3),
V1_sd = sd(V1),
V2_sd = sd(V2),
V3_sd = sd(V3)
)
#connect averages and sd's to data frame
df <- merge(df,avg_sd,by="Month")
#standatrise your variables. I used subtraction but you can use any formula you want
df <- ddply(df,.(Year, Month, V1, V2, V3, V1_m, V2_m, V3_m), summarize,
s_m_V1 = V1-V1_m,
s_m_V2 = V2-V2_m,
s_m_V3 = V3-V3_m,
s_sd_V1 = V1-V1_sd,
s_sd_V2 = V2-V2_sd,
s_sd_V3 = V3-V3_sd
)