Search code examples
pythonnumpyscipyastropy

Calculate coefficient of variation of window in astropy


I have an array that I want to calculate statistics for using astropy. What I have is:

from astropy.convolution import convolve
import numpy as np

x = np.random.randint(1, 10, size=(5, 5))
y = convolve(x, np.ones((3, 3)), boundary='extend', preserve_nan=True)

print(x)
print(y)

[[9 1 8 6 5]
 [4 2 1 8 4]
 [2 8 4 6 6]
 [8 4 8 5 6]
 [7 3 1 2 3]]

[[5.33333333 4.77777778 4.55555556 5.66666667 5.33333333]
 [4.55555556 4.33333333 4.88888889 5.33333333 5.55555556]
 [4.66666667 4.55555556 5.11111111 5.33333333 5.66666667]
 [5.44444444 5.         4.55555556 4.55555556 4.77777778]
 [6.         4.66666667 3.22222222 3.44444444 3.66666667]]

Each element in y is the mean of a 3x3 box drawn around that position in x. What I'd like to have instead of the mean is the coefficient of variation (standard deviation divided by mean). I'm not sure if this can be done in astropy or if I need to use something else like

from scipy.stats import variation

Solution

  • Astropy builds on numpy and scipy. Numpy is the low-level array library that implements the data storage and basic operations that are used by higher level libraries such as scipy and astropy. Understanding how numpy arrays work will help you work with astropy.

    Since you want to do statistics on a rolling window, something like scipy.stats.variation will not help you directly. It computes statistics on elements grouped by axis, which is not how your elements are grouped. You could simulate a rolling window through axes using something like np.lib.stride_tricks.as_strided, but that is dangerous, error prone, and not very efficient. Instead, I suggest using the convolution method similar to what you have already done.

    First, remember that the variance can be expressed as

    var = sum(x^2) / N - (sum(x) / N)^2
    

    So let's do this:

    kernel = np.ones((3, 3))
    mean_x = convolve(x, kernel, boundary='extend', preserve_nan=True)
    mean_square_x = convolve(x**2, kernel, boundary='extend', preserve_nan=True)
    

    You now have sum(x) / N for each window, and sum(x^2) / N. The standard deviation is just the square root of the variance:

    std_x = np.sqrt(mean_square_x - mean_x**2)
    

    The result you want is the ratio

    result = std_x / mean_x
    

    You can do the whole computation more concisely as

    m = convolve(x, kernel, boundary='extend', preserve_nan=True)
    result = np.sqrt(convolve(x**2, kernel, boundary='extend', preserve_nan=True) - m**2) / m