How to smooth signals statistically correct in Python?

I stumbled over the problem of smoothing data without pretending an exaggerated accuracy of the measured data.

When I was searching for simple solutions, I found a lot of filtering approaches, that leave the shape of the data unchanged (i.e. the number of datapoints is not reduced); from my point of view, this means either, that the data is undergoing some kind of fitting (Scipy cookbook: Savitzky Golay) (which means, that it is not really the original data) or it is just statistically incorrect (such as the "adjacent averaging" in Origin), that is averaging over a window of datapoints for every datapoint, because it is pretending, that the accuracy is higher than it actually is. I experienced cases, where such a smoothing made signal artefacts appear significant, that were not.

I created an example, where the actual features appear incorrectly due to averaging without reducing the number of datapoints. Especially the main peak on the right appears extremely broadened, if you know the true data, but the high number of datapoints makes it look very neat and smooth. Furthermore the noise level is nearly completely smoothed away, but now some smaller features seem to rise in the center and appear as significant data, but as you see in the original data, they are not.

I am searching now for simple/efficient ways to reduce the noise statistically correct that do not impose assumptions (such as chosen fitting functions etc.) on the data, as well as learning about advantages or disadvantages of different implementations.

Solution

One approach, originating in the image processing, is the downscale_local_mean from the skimage package.

import matplotlib.pyplot as plt    
from skimage.transform import downscale_local_mean
data=np.genfromtxt(r'example_smoothing_data.txt',delimiter=";")

smoothed=downscale_local_mean(data, (8, 1))

plt.figure()
plt.plot(data[:,0],data[:,1],'b-',label='Original data')
plt.plot(smoothed[:,0],smoothed[:,1],'r-',label='Smoothed data')
plt.legend()
plt.show()

In the figure, it can be clearly seen, that the number of datapoints is reduced, which is basically just another integration window during measurement; the decrease in accuracy in the x-direction is basically paying for the increased accuracy in the y-direction.