Search code examples
pythonlistscalemeanstandard-deviation

How should the values of a list be scaled such that they meet standard deviation and mean requirements?


I have lists of values that I want to have scaled to meet certain standard deviation and mean requirements. Specifically, I want the datasets standardised to mean 0 with standard deviation 1, except for datasets for which all values are greater than 0; these I want scaled such that their mean is 1.

What would be a good way to do this type of thing in Python?


Solution

  • If you're working with data in Python, you're going to want to be using the science stack (see here), in particular numpy, scipy, and pandas. What you're looking for is the zscore, and that's a common enough operation that it's built-in to scipy as scipy.stats.zscore.

    Starting from a random array with non-zero mean and non-unity stddev:

    >>> import numpy as np
    >>> import scipy.stats
    >>> data = np.random.uniform(0, 100, 10**5)
    >>> data.mean(), data.std()
    (49.950550280158893, 28.910154760235972)
    

    We can renormalize:

    >>> renormed = scipy.stats.zscore(data)
    >>> renormed.mean(), renormed.std()
    (2.0925483568134951e-16, 1.0)
    

    And shift if we want:

    >>> if (data > 0).all():
    ...     renormed += 1
    ...     
    >>> renormed.mean(), renormed.std()
    (1.0000000000000002, 1.0)
    

    We could do this manually, of course:

    >>> (data - data.mean())/data.std()
    array([-0.65558504,  0.24264144, -0.1112242 , ..., -0.40785103,
           -0.52998332,  0.10104563])
    

    (Note that by default this uses a delta degrees of freedom of 0, i.e. the denominator is N. If you want N-1, pass ddof=1).