Search code examples
rdplyrnormalizationscaling

Scaling by group in R using dplyr: grouping and non-grouping seem to generate the same result


Following up with a previous question (link), it appears that grouping data does not change the scaling when using piping and dplyr. Here is some sample code, slightly altered from the linked question.

set.seed(123)
n = 1000
df <- data.frame(ID = sample(c("A","B","C","D","E"), size=n, replace=TRUE),
                 score = runif(n, 0, 10))

scaledByID <- 
        df %>%
        group_by(ID) %>%
        mutate(scaledScore = scale(score))

notScaledByID <- 
        df %>%
        mutate(scaledScore = scale(score))

mean(scaledByID$scaledScore == notScaledByID$scaledScore)
#[1] 1

packageVersion("dplyr")
#[1] ‘0.7.4’

The values are identical for scaledByID and notScaledbyID, which leads me to believe it's not scaling by ID. Any suggestions?

Edit to add version of R and RStudio:

RStudio.Version()$version
#[1] ‘1.2.91’

R.version.string
#[1] "R version 3.4.2 (2017-09-28)"

Solution

  • Problem appears to be an error with the version 1.2.91 of RStudio. I downgraded to stable build (version 1.1.383), and the new output for mean(scaledByID$scaledScore == notScaledByID$scale) is 0.

    Version of R is the same for both (3.4.2).