Average by Prod if null in R

df <- structure(list(Month = c(1, 2, 3, 1, 2, 3), 
Product = c('A', 'A','A','B','B','B'),
Amount1 = c(100,NA,200,200,400,NA),
calculated1 = c(100,150,200,200,400,300),
Amount2 = c(100,NA,300,200,NA,500),
calculated2 = c(100,200,300,200,350,500)),         
.Names=c("Month","Product","Amount1","Calculated1",
"Amount2","Calculated2"),row.names = c(NA, 6L), class = "data.frame")

I need your help on the above sample problem. I have month, product, Amount1 and Amount2 columns in R data table. Not all the months have amount information for all products as shown in above dataframe. if NA found(missing value) for any product, i want to calculate average of remaining months for respective products as shown in "Calculated" columns. Any help on this is highly appreciated, i am new learner.

Thank you!

I have tried filtering based on Amount1 and then calculate average and assign values to different data frame. Later use join to bring the values for nulls. But since i have 2 different Amount columns to work on filtering on one column would not help.

Solution

I used this sample data:

df <- data.frame(Month = c(1, 2, 3, 1, 2, 3), 
                     Product = c('A', 'A','A','B','B','B'),
                     Amount1 = c(100,NA,200,200,400,NA),
                     calculated1 = c(100,150,200,200,400,300),
                     Amount2 = c(100,NA,300,200,NA,500),
                     calculated2 = c(100,200,300,200,350,500))

The processing is as follows:

library(dplyr)
df %>% 
  group_by(Month, Product) %>% # grouping using both variables
  mutate(Calc1_M_P = mean(Amount1), # temporary variables (columns) are created
         Calc2_M_P = mean(Amount2)) %>% # mean is calculated so that if there are NAs the function returns NA
  ungroup() %>% 
  group_by(Product) %>% # The data.frame is grouped again using only Product
  mutate(Calc1_P = mean(Amount1, na.rm = T), # New tmp variables are created. NAs skipped by mean()
         Calc2_P = mean(Amount2, na.rm = T)) %>% 
  mutate(Calc1 = coalesce(Calc1_M_P, Calc1_P), # final variables are constructed. If first is present the econd is skipped otherwise the second is taken
         Calc2 = coalesce(Calc2_M_P, Calc2_P),
         Calc1_P = NULL, # drop tmp variables
         Calc2_P = NULL,
         Calc1_M_P = NULL,
         Calc2_M_P = NULL
               )

The above code returns:

# A tibble: 6 × 8
# Groups:   Product [2]
  Month Product Amount1 calculated1 Amount2 calculated2 Calc1 Calc2
  <dbl> <chr>     <dbl>       <dbl>   <dbl>       <dbl> <dbl> <dbl>
1     1 A           100         100     100         100   100   100
2     2 A            NA         150      NA         200   150   200
3     3 A           200         200     300         300   200   300
4     1 B           200         200     200         200   200   200
5     2 B           400         400      NA         350   400   350
6     3 B            NA         300     500         500   300   500

You see that Calc1 == calculated1 and Calc2 == calculated2. So the result is as expected.