Search code examples
rdataframedplyrsummarize

custom function does not work on column named "x" unless specified by .$x in summarise() dplyr R


I wanted to create a custom function to calculate confidence intervals of a column by creating two columns called lower.bound and upper.bound. I also wanted this function to be able to work within dplyr::summarize() function.

The function works as expected in all tested circumstances, but it does not when the column is named "x". When it is it draws a warning and returns NaN values. It only works when the column is specifically declared as .$x. Here is an example of the code. I don't understand the nuance... could you point me to the right direction in understanding this?

set.seed(12)

# creates random data frame
z <- data.frame(
        x = runif(100),
        y = runif(100),
        z = runif(100)
)

# creates function to calculate confidence intervals
conf.int <- function(x, alpha = 0.05) {
        
        sample.mean <- mean(x)
        sample.n <- length(x)
        sample.sd <- sd(x)
        sample.se <- sample.sd / sqrt(sample.n)
        t.score <- qt(p = alpha / 2, 
                   df = sample.n - 1, 
                   lower.tail = F)
        margin.error <- t.score * sample.se
        lower.bound <- sample.mean - margin.error
        upper.bound <- sample.mean + margin.error
        
        as.data.frame(cbind(lower.bound, upper.bound))
        
}

# This works as expected
z %>% 
        summarise(x = mean(y), conf.int(y))

# This does not
z %>% 
        summarise(x = mean(x), conf.int(x))

# This does 
z %>% 
        summarise(x = mean(x), conf.int(.$x))

Thanks!


Solution

  • This is a "feature" in dplyr which makes the updated value of x (which has the mean value) is available when you pass it to conf.int function.

    Possible options are -

    1. Change the name of the variable to store the mean value
    library(dplyr)
    
    z %>% summarise(x1 = mean(x), conf.int(x))
    
    #         x1 lower.bound upper.bound
    #1 0.4797154   0.4248486   0.5345822
    
    1. Change the order
    z %>% summarise(conf.int(x), x = mean(x))
    
    #  lower.bound upper.bound         x
    #1   0.4248486   0.5345822 0.4797154