Search code examples
rapplydo.call

Pass a vector of distribution functions to calculate a mean per case in R


I have several probability distribution functions defined using the pdqr package. Let say, they are A, B and C:

A <- as_d(function(x)dnorm(x, mean = 3, sd = 1))
B <- as_d(function(x)dnorm(x, mean = 6, sd = 1))
C <- as_d(function(x)dnorm(x, mean = 2, sd = 2))

I have a large data.frame with a vector which has a character describing the appropiate PDF per case in a vector distr, let say:

df <- data.frame(distr = c("A", "C", "A", "B", "B", "A", "C"))

I would like to generate the mean of each PDF per case. Individually this works like this for PDF A:

> pdqr::summ_mean(A)
[1] 3

Now I would like to generate the mean for each case based on the PDF set in distr. This means passing the PDF into pdqr::sum_mean(). I have tried the following with the resulting errors:

> df$distr_mean <- summ_mean(df$distr)
Error: `f` is not pdqr-function. It should be function.
> 
> df$distr_mean <- summ_mean(invoke_map(df$distr))
Error in A() : argument "x" is missing, with no default
> 
> df$distr_mean <- df %>%
+   pull(distr) %>%
+   summ_mean()
Error: `f` is not pdqr-function. It should be function.

So, either it doesn't understand that a pdqr-function is being passed, or it needs a x-value, which doesn't make sense, since I want the mean over the entire distribution, not a single x (passing a range like c(1:10) also doesn't work). Furthermore, I understand that any apply or do.call function only passes one single function, while I want to pass several different functions, given in a vector.

How to proceed?


Solution

  • One way to do this is to use the distr column as an argument to mget, which will return all the appropriate functions in a list. Just feed that list to summ_mean using sapply:

    sapply(mget(df$distr), pdqr::summ_mean)
    #> A C A B B A C 
    #> 3 2 3 6 6 3 2 
    

    Though inside mutate you'll need to tell mget which environment the functions will be found:

    df %>% 
      mutate(distr_mean = sapply(mget(distr, envir = .GlobalEnv), pdqr::summ_mean))
    #>   distr distr_mean
    #> 1     A          3
    #> 2     C          2
    #> 3     A          3
    #> 4     B          6
    #> 5     B          6
    #> 6     A          3
    #> 7     C          2