Search code examples
rscalenormalizerescale

Normalize by set standard deviation from mean of every column (excluding first)


I have a dataset below:

  A       B     C      D
500       2     4      6
501       6     8     45
502       4     7      9 

How do I normalize every column excluding the first to be normalized and have a set standard deviation from the mean of each column.

So for example below are the means for each column:

B = 4
C = 6.333
D = 20

I then want to normalize with the bounds to be no greater than 25% of the mean in either direction.

I think you can do it with rescale but I just don't know how to apply it to all columns:

library(scales)
rescale(x, to = c(mean - 0.25*mean, mean + 0.25*mean)

I know this is a way to do it but it doesn't take into account the bounds and the standard deviation set of 25%:

normalized <- function(x){
  return((x-min(x)) / (max(x)-min(x)))
}

normalized_dataset<-df %>% 
  mutate_at(vars(-one_of("A")), normalized)

Solution

  • I hope function rescale comes from package scales.

    This is a typical example of the use of the *apply family of functions.
    I will work on a copy of the data and rescale the copy, if you don't want to keep the original, it's a simple matter to modify the code below.

    dat2 <- dat
    
    dat2[-1] <- lapply(dat2[-1], function(x)
        scales::rescale(x, to = c(mean(x) - 0.25*mean(x), mean(x) + 0.25*mean(x))))
    
    dat2
    #    A B        C        D
    #1 500 3 4.750000 15.00000
    #2 501 5 7.916667 25.00000
    #3 502 4 7.125000 15.76923
    

    Data.

    dat <- read.table(text = "
      A       B     C      D
    500       2     4      6
    501       6     8     45
    502       4     7      9 
    ", header = TRUE)