Search code examples
rdplyrlazy-evaluationoperator-precedencense

Evaluation order inconsistency with dplyr mutate


I have 2 functions that I use inside a mutate call. One produces per row results as expected while the other repeats the same value for all rows:

library(dplyr)

df <- data.frame(X = rpois(5, 10), Y = rpois(5,10))

pv <- function(a, b) {
  fisher.test(matrix(c(a, b, 10, 10), 2, 2),
              alternative='greater')$p.value
}

div <- function(a, b) a/b

mutate(df,  d = div(X,Y), p = pv(X, Y))

which produces something like:

    X  Y         d         p
1  9 15 0.6000000 0.4398077
2  8  7 1.1428571 0.4398077
3  9 14 0.6428571 0.4398077
4 11 15 0.7333333 0.4398077
5 11  7 1.5714286 0.4398077

ie the d column varies, but v is constant and its value does not actually correspond to the X and Y values in any of the rows.

I suspect this relates to NSE, but I do not undertand how from what litlle I have been able to find out about it.

What accounts for the different behaviours of div and pv? How do I fix pv?


Solution

  • We need rowwise

    df %>% 
        rowwise() %>% 
        mutate(d = div(X,Y), p = pv(X,Y))
    #    X     Y        d         p
    # <int> <int>    <dbl>     <dbl>
    #1    10     9 1.111111 0.5619072
    #2    12     8 1.500000 0.3755932
    #3     9     8 1.125000 0.5601923
    #4    11    16 0.687500 0.8232217
    #5    16    10 1.600000 0.3145350
    

    In the OP's code, the pv is taking the 'X' and 'Y' columns as input and it gives a single output.


    Or as @Frank mentioned, mapply can be used

    df %>%
       mutate(d = div(X,Y), p = mapply(pv, X, Y))