Search code examples
rnormalizationmeanr-factor

Getting Factor Means into the dataset after calculation


I am trying to create a normalization value for a variable I am working with based on individual conference means and SDs. I found the conference means using the function:

confavg=aggregate(base$AVG, by=list(base$confName), FUN=mean)

And so after getting the means for the 31 conferences, I want to go back and for each individual player put these means in so I can easily calculate a normalization factor based on the conference mean.

I have tried to create large ifelse or if statements where confavg is the conference average.

ifelse((base$confName=="America East Conference"),confavg[1,2]->base$CAVG,0->base$CAVG)

but nothing works. Ideally I would want to take every player and say:

Normalization = (player average - conference average)/conference standard deviation

How should I go about doing that?

edit:

Here is some sample data:

AVG = c(.350,.400,.320,.220,.100,.250,.400,.450)
Conf = c("SEC","ACC","SEC","B12","P12","ACC","B12","P12")
Conf=as.factor(Conf)
sampleconfavg=aggregate(AVG, by=list(Conf), FUN=mean)
sampleconfsd=aggregate(AVG, by=list(Conf), FUN=sd)

So each player would have their average - the conference average / sd of conference

so for the first guy it would be:

(.350 - .335) / 0.0212132 = 0.7071069

but I am hoping to build a function that does it for all people in my dataset. Thank you!

edit2:

Alright the answer below is amazing but I am running into (hopefully) one last problem. I want to basically do this process to three variables like:

base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledAVG <- scale(x$AVG); x}))
base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledOBP <- scale(x$OBP); x}))
base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledK.AB <- scale(x$K.AB); x}))

Which works but then when I search the datafile like:

base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),]

it resets the Scaled K.AB value and doesn't use it as part of the parameters of the search.


Solution

  • Here is an example to scale iris$Sepal.Length, within groups of iris$Species:

    scaled.iris <- do.call(rbind, 
      by(iris, iris$Species,
         FUN=function(x) { x$Scaled.Sepal.Length <- scale(x$Sepal.Length); x }
      )
    )
    
    head(scaled.iris)
    ##          Sepal.Length Sepal.Width Petal.Length Petal.Width Species Scaled.Sepal.Length
    ## setosa.1          5.1         3.5          1.4         0.2  setosa          0.26667447
    ## setosa.2          4.9         3.0          1.4         0.2  setosa         -0.30071802
    ## setosa.3          4.7         3.2          1.3         0.2  setosa         -0.86811050
    ## setosa.4          4.6         3.1          1.5         0.2  setosa         -1.15180675
    ## setosa.5          5.0         3.6          1.4         0.2  setosa         -0.01702177
    ## setosa.6          5.4         3.9          1.7         0.4  setosa          1.11776320
    

    Edit:

    Using your sample data (Conf and AVG only):

    d <- data.frame(Conf, AVG)
    dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$AVG); x}))
    
    # Remove generated row names
    rownames(dd) <- NULL
    
    dd
    ##   Conf  AVG     Scaled
    ## 1  ACC 0.40  0.7071068
    ## 2  ACC 0.25 -0.7071068
    ## 3  B12 0.22 -0.7071068
    ## 4  B12 0.40  0.7071068
    ## 5  P12 0.10 -0.7071068
    ## 6  P12 0.45  0.7071068
    ## 7  SEC 0.35  0.7071068
    ## 8  SEC 0.32 -0.7071068