Search code examples
rtransformscale

Transform data to specific mean/sd but fall within range (i.e. > 0)


I'm needing to transform some data to have a specific mean and sd. I'm working off of this question except that I need my final answer to be positive, as in greater than 0.

https://stats.stackexchange.com/questions/46429/transform-data-to-desired-mean-and-standard-deviation

Does anyone have any idea? Thank you.


Solution

  • y <- rnorm(1000, mean = 10, sd = 3) # creates normal distributed random data
    mean_target <- 5 #desired mean/sd of the data
    sd_target   <- 1
    y2 <- mean_target + (y - mean(y)) * sd_target/sd(y) #according to the given formula following the link you provided
    print(y2)
    

    If you have a problem with negative values than you could cut the values lower than zero to zero. This will of course change the mean and sd slightly then.

    y2[y2 < 0] <- 0
    

    It is not possible (for all positive mean and sd) to apply these specific values and keep all values positive for sure. So the only way I can think of is to manipulate the outliers.

    Rereading your question let me think that you maybe want some iterative approach to force the desired mean and sd. Assuming you want to throw away the outliers (smaller than zeros), the following approach may help. But be warned that it may change your data significantly!

    applyMeanSD <- function(y, mean_target, sd_target, max_iter = 10){
        iter <- 0
        while(any(y < 0) || iter < max_iter){
            iter <- iter + 1
            y <- y[y > 0] #throws away all outliers
            if (length(y) > 1)
                y <- mean_target + (y - mean(y)) * sd_target/sd(y)
            else 
                return (NULL)
        }
        return(y)
    }
    
    test2 <- applyMeanSD(test <- rnorm(100, 0, 1), 1, 0.5)
    test #negative values included
    test2 #no negative values
    mean(test2)
    sd(test2)