python numpy scipy data-fitting peakutils

Removing the baseline from data

I want to remove the baseline and ultimately find the peaks of some noisy data using python (Raman Scattering measurements if anybody's had experience with that before).

Following this guide on the PeakUtils library (https://pythonhosted.org/PeakUtils/tutorial_a.html), the author fits the data to a polynomial with polyval, and then finds a baseline based on this and subtracts it.

My questions are a) why bother fitting a polynomial to the data, why not just remove the baseline from the data as it is? and b) what significance do the parameters [0.002,-0.08,5] have that they pass to polyval? Will I need to fine-tune these for my own data? Can someone explain how this works for me?

y2 = y + numpy.polyval([0.002,-0.08,5], x)
pyplot.figure(figsize=(10,6))
pyplot.plot(x, y2)
pyplot.title("Data with baseline")

base = peakutils.baseline(y2, 2)
pyplot.figure(figsize=(10,6))
pyplot.plot(x, y2-base)
pyplot.title("Data with baseline removed")

My data is of the same shape as seen here (below) except this has obviously already had the background removed.

Solution

In the PeakUtils guide, [0.002, -0.08, 5] they pass to polyval stands for y = 0.002*x^2 - 0.08*x + 5, and this is in order to create example data that looks parabolic ("right part of a U-shape" baseline). It could have been flat, straight, or any other by passing a shorter or longer list of polynomial coefficients. The example data is called y2, which is the sum of the previous example data y and an artificially added baseline with polyval.

Then they apply peakutils.baseline on the result y2, specifying a parameter 2 that is probably the degree of the fit (again, parabolic because it looks parabolic, but you may have to try others to compare). peakutils.baseline will fit a parabola (i.e. calculate its coefficients), then return the points on the parabola that correspond to each of your y2 points. Finally, y2-base is the data corrected for the baseline.

Your data looks flat, so there should be no need to correct for baseline (except maybe for vertical shift).