df <- structure(list(Month = c(1, 2, 3, 1, 2, 3),
Product = c('A', 'A','A','B','B','B'),
Amount1 = c(100,NA,200,200,400,NA),
calculated1 = c(100,150,200,200,400,300),
Amount2 = c(100,NA,300,200,NA,500),
calculated2 = c(100,200,300,200,350,500)),
.Names=c("Month","Product","Amount1","Calculated1",
"Amount2","Calculated2"),row.names = c(NA, 6L), class = "data.frame")
I need your help on the above sample problem. I have month, product, Amount1 and Amount2 columns in R data table. Not all the months have amount information for all products as shown in above dataframe. if NA found(missing value) for any product, i want to calculate average of remaining months for respective products as shown in "Calculated" columns. Any help on this is highly appreciated, i am new learner.
Thank you!
I have tried filtering based on Amount1 and then calculate average and assign values to different data frame. Later use join to bring the values for nulls. But since i have 2 different Amount columns to work on filtering on one column would not help.
I used this sample data:
df <- data.frame(Month = c(1, 2, 3, 1, 2, 3),
Product = c('A', 'A','A','B','B','B'),
Amount1 = c(100,NA,200,200,400,NA),
calculated1 = c(100,150,200,200,400,300),
Amount2 = c(100,NA,300,200,NA,500),
calculated2 = c(100,200,300,200,350,500))
The processing is as follows:
library(dplyr)
df %>%
group_by(Month, Product) %>% # grouping using both variables
mutate(Calc1_M_P = mean(Amount1), # temporary variables (columns) are created
Calc2_M_P = mean(Amount2)) %>% # mean is calculated so that if there are NAs the function returns NA
ungroup() %>%
group_by(Product) %>% # The data.frame is grouped again using only Product
mutate(Calc1_P = mean(Amount1, na.rm = T), # New tmp variables are created. NAs skipped by mean()
Calc2_P = mean(Amount2, na.rm = T)) %>%
mutate(Calc1 = coalesce(Calc1_M_P, Calc1_P), # final variables are constructed. If first is present the econd is skipped otherwise the second is taken
Calc2 = coalesce(Calc2_M_P, Calc2_P),
Calc1_P = NULL, # drop tmp variables
Calc2_P = NULL,
Calc1_M_P = NULL,
Calc2_M_P = NULL
)
The above code returns:
# A tibble: 6 × 8
# Groups: Product [2]
Month Product Amount1 calculated1 Amount2 calculated2 Calc1 Calc2
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 100 100 100 100 100
2 2 A NA 150 NA 200 150 200
3 3 A 200 200 300 300 200 300
4 1 B 200 200 200 200 200 200
5 2 B 400 400 NA 350 400 350
6 3 B NA 300 500 500 300 500
You see that Calc1 == calculated1
and Calc2 == calculated2
. So the result is as expected.