I want to scale values in the column of a dataframe
based on values in another colum. For example, here is a simple example
d<-data.frame(x=runif(5,0,10),y=sample(c(1,2),size=5,replace=TRUE))
gives the output:
x y
1 1.0895865 2
2 0.8261554 2
3 5.3503761 2
4 3.3940759 1
5 6.2786637 1
I want to scale the x values based on the y values, so what I want is to have:
(x|y=1 - average(x's | y=1))/std.dev(x's|y=1)
then replace the x values in d with the scaled values, similarly for the x
values with y=2
.
What I have done so far is a bit clunky:
d1<-subset(d,y==1)
d2<-subset(d,y==2)
d1$x<-(d1$x-mean(d1$x))/sd(d1$x)
d2$x<-(d2$x-mean(d2$x))/sd(d2$x)
and then binding all the results in one big data frame, but this is a bit tedious since my actual data has 50 different values for y and I'd like to do this for multiple (different) columns.
You can easily do this using group_by
and mutate
from the dplyr
package:
require(dplyr)
d %>%
group_by(y) %>%
mutate(x = (x - mean(x)) / sd(x))