Following up with a previous question (link), it appears that grouping data does not change the scaling when using piping and dplyr. Here is some sample code, slightly altered from the linked question.
set.seed(123)
n = 1000
df <- data.frame(ID = sample(c("A","B","C","D","E"), size=n, replace=TRUE),
score = runif(n, 0, 10))
scaledByID <-
df %>%
group_by(ID) %>%
mutate(scaledScore = scale(score))
notScaledByID <-
df %>%
mutate(scaledScore = scale(score))
mean(scaledByID$scaledScore == notScaledByID$scaledScore)
#[1] 1
packageVersion("dplyr")
#[1] ‘0.7.4’
The values are identical for scaledByID
and notScaledbyID
, which leads me to believe it's not scaling by ID. Any suggestions?
Edit to add version of R and RStudio:
RStudio.Version()$version
#[1] ‘1.2.91’
R.version.string
#[1] "R version 3.4.2 (2017-09-28)"
Problem appears to be an error with the version 1.2.91 of RStudio. I downgraded to stable build (version 1.1.383), and the new output for mean(scaledByID$scaledScore == notScaledByID$scale)
is 0
.
Version of R is the same for both (3.4.2).