I have several probability distribution functions defined using the pdqr
package. Let say, they are A
, B
and C
:
A <- as_d(function(x)dnorm(x, mean = 3, sd = 1))
B <- as_d(function(x)dnorm(x, mean = 6, sd = 1))
C <- as_d(function(x)dnorm(x, mean = 2, sd = 2))
I have a large data.frame with a vector which has a character describing the appropiate PDF per case in a vector distr
, let say:
df <- data.frame(distr = c("A", "C", "A", "B", "B", "A", "C"))
I would like to generate the mean of each PDF per case. Individually this works like this for PDF A
:
> pdqr::summ_mean(A)
[1] 3
Now I would like to generate the mean for each case based on the PDF set in distr
. This means passing the PDF into pdqr::sum_mean()
. I have tried the following with the resulting errors:
> df$distr_mean <- summ_mean(df$distr)
Error: `f` is not pdqr-function. It should be function.
>
> df$distr_mean <- summ_mean(invoke_map(df$distr))
Error in A() : argument "x" is missing, with no default
>
> df$distr_mean <- df %>%
+ pull(distr) %>%
+ summ_mean()
Error: `f` is not pdqr-function. It should be function.
So, either it doesn't understand that a pdqr-function is being passed, or it needs a x-value, which doesn't make sense, since I want the mean over the entire distribution, not a single x (passing a range like c(1:10) also doesn't work). Furthermore, I understand that any apply
or do.call
function only passes one single function, while I want to pass several different functions, given in a vector.
How to proceed?
One way to do this is to use the distr
column as an argument to mget
, which will return all the appropriate functions in a list. Just feed that list to summ_mean
using sapply
:
sapply(mget(df$distr), pdqr::summ_mean)
#> A C A B B A C
#> 3 2 3 6 6 3 2
Though inside mutate
you'll need to tell mget
which environment the functions will be found:
df %>%
mutate(distr_mean = sapply(mget(distr, envir = .GlobalEnv), pdqr::summ_mean))
#> distr distr_mean
#> 1 A 3
#> 2 C 2
#> 3 A 3
#> 4 B 6
#> 5 B 6
#> 6 A 3
#> 7 C 2