I have an array (lons) of longitude values in the range [-180, 180]. I need to find the mean of the time series. This is easily done with
np.mean(lons)
This straight forward mean, of course, doesn't work if the series contains values either side of the dateline. What is the correct way of calculating the mean for all possible cases? Note, I would rather not have a condition that treats dateline crossing cases differently.
I've played around with np.unwrap after converting from degrees to rad, but I know my calculations are wrong because a small percentage of cases are giving me mean longitudes somewhere near 0 degrees (the meridian) over Africa. These aren't possible as this is an ocean data set.
Thanks.
EDIT: I now realise a more precise way of calculating the mean [lat, lon] position of a time series might be to convert to a cartesian grid. I may go down this route.
This is an application for directional statistics, where the angular mean is computed in the complex plane (see this section). The result is a complex number, whose imaginary part represents the mean angle:
import numpy as np
def angular_mean(angles_deg):
N = len(angles_deg)
mean_c = 1.0 / N * np.sum(np.exp(1j * angles_deg * np.pi/180.0))
return np.angle(mean_c, deg=True)
lons = [
np.array([-175, -170, 170, 175]), # broad distribution
np.random.rand(1000) # narrow distribution
]
for lon in lons:
print angular_mean(lon), np.mean(lon)
As you can see, arithmetic mean and angular mean are quite similar for a narrow distribution, whereas they differ significantly for a broad distribution.
Using cartesian coordinates is not appropriate, as the center of mass will be located within the earth, but since you are using surface data I assume you want it to be located on the surface.