I have some seriously noisy histogram data which makes up a series of peaks. I need to find the area under the first one, so I was planning on doing a spline fit and taking the derivative to find the relevant stationary points (i.e. first trough). However, I'm not sure how to approaching taking the derivative of the fitted data (or indeed how practically fit the data).
So initially a mixed Gaussian approach looked really promising. The issue is that as well as noisy data, the signal source actually varies into a couple of distinct cases, such that I'd often find one combination of Gaussians which worked on one data-set would fail (drastically) on another.
Getting around this was possible, but the more general solutions introduced drift/bias into the approximations which had an inconsistent impact depending on both noise and the underlying case.
After faffing with this for a while, I opted to try matlab's curvedspline
instead. This ended up providing a much better approach, which I then combined with some multidimensional cluster analysis to pick out places where the spine fitting had clearly gone awry. Using this meant that rather than fitting to bad data (i.e. data which gave serious deviations from the bulk data) I was able to discard these outliers. Specifically, I used domain knowledge to work out cases where, by definition, outliers were a result of a poor fit and not sample variance. This actually only lead to a couple of data points per sample being discarded (1-2 out of 20) and gave pretty clean results in the end.