I have an array that I want to calculate statistics for using astropy. What I have is:
from astropy.convolution import convolve
import numpy as np
x = np.random.randint(1, 10, size=(5, 5))
y = convolve(x, np.ones((3, 3)), boundary='extend', preserve_nan=True)
print(x)
print(y)
[[9 1 8 6 5]
[4 2 1 8 4]
[2 8 4 6 6]
[8 4 8 5 6]
[7 3 1 2 3]]
[[5.33333333 4.77777778 4.55555556 5.66666667 5.33333333]
[4.55555556 4.33333333 4.88888889 5.33333333 5.55555556]
[4.66666667 4.55555556 5.11111111 5.33333333 5.66666667]
[5.44444444 5. 4.55555556 4.55555556 4.77777778]
[6. 4.66666667 3.22222222 3.44444444 3.66666667]]
Each element in y
is the mean of a 3x3 box drawn around that position in x
. What I'd like to have instead of the mean is the coefficient of variation (standard deviation divided by mean). I'm not sure if this can be done in astropy or if I need to use something else like
from scipy.stats import variation
Astropy builds on numpy and scipy. Numpy is the low-level array library that implements the data storage and basic operations that are used by higher level libraries such as scipy and astropy. Understanding how numpy arrays work will help you work with astropy.
Since you want to do statistics on a rolling window, something like scipy.stats.variation
will not help you directly. It computes statistics on elements grouped by axis, which is not how your elements are grouped. You could simulate a rolling window through axes using something like np.lib.stride_tricks.as_strided
, but that is dangerous, error prone, and not very efficient. Instead, I suggest using the convolution method similar to what you have already done.
First, remember that the variance can be expressed as
var = sum(x^2) / N - (sum(x) / N)^2
So let's do this:
kernel = np.ones((3, 3))
mean_x = convolve(x, kernel, boundary='extend', preserve_nan=True)
mean_square_x = convolve(x**2, kernel, boundary='extend', preserve_nan=True)
You now have sum(x) / N
for each window, and sum(x^2) / N
. The standard deviation is just the square root of the variance:
std_x = np.sqrt(mean_square_x - mean_x**2)
The result you want is the ratio
result = std_x / mean_x
You can do the whole computation more concisely as
m = convolve(x, kernel, boundary='extend', preserve_nan=True)
result = np.sqrt(convolve(x**2, kernel, boundary='extend', preserve_nan=True) - m**2) / m