I am currently processing experimental data for my thesis and am running into a problem with scipy curve_fit.
This is a study of LED emission with the following model depicting the absorption spectra for a specific LED composition/wavelength.
The model is this:
The basic idea is, we got experimental data and we want to fit this equation to give us a best guess of a vertical shift in the data that is a result of the equipment used in the experiment. And to get that vertical shift, the function to be used in the curve_fit
would take the form of a + c * E * np.sqrt(E-bandE) * np.exp(-E*b)
. bandE/Eg refers to the bandgap energy of the material which will be provided in the code section. E refers to the photon energy.
The values I am using in a pandas dataframe that I kept as a list for you to copy and paste (if you want it),
photon_energy = [1.1271378805005456, 1.1169834851807208, 1.1070104183487501, 1.0972138659739825, 1.0875891829391229, 1.0781318856961741, 1.0688376453022415, 1.0597022808124787, 1.0507217530089832, 1.0418921584458825, 1.0332097237921667, 1.0246708004550413, 1.016271859467705, 1.0080094866265041, 0.9998803778633872, 0.9918813348404801, 0.9840092607544446, 0.9762611563390552, 0.9686341160551564, 0.9611253244578295, 0.9537320527312309, 0.9464516553821375, 0.939281567083788, 0.9322192996621053, 0.9252624392168658, 0.918408643370815, 0.9116556386401471, 0.9050012179201461, 0.898443238080145, 0.8919796176623023, 0.885608334679, 0.8793274245039717, 0.8731349778525352, 0.8670291388465735, 0.8610081031601389, 0.8550701162417932, 0.8492134716100002, 0.8434365092180953, 0.8377376138855407, 0.8321152137923491, 0.8265677790337335]
s2c = 1.0711371944297785, 1.0231329828975677, 1.0994106908895496, 1.5121380434280387, 1.4362625879245816, 1.6793735384201034, 1.967376254925342, 2.718958670464331, 2.8657461347457933, 3.2265806746948247, 4.073118384895329, 5.002080377098846, 5.518310980392261, 6.779117609004787, 7.923629188601875, 9.543272102194026, 11.061716095291905, 12.837722885549315, 15.156654004011116, 17.604461138085984, 20.853321055852934, 24.79640344112394, 28.59835938028905, 32.5257456, 37.87676923906976, 42.15321400245093, 46.794297771521705, 56.44267690099888, 61.60473904566305, 70.99822229568558, 77.60736232076566, 84.37513036736146, 92.9038746946938, 107.54475674330527, 117.91910226690293, 137.67481655050688, 158.02001455302846, 176.37334256204952, 195.20886164268876, 215.87011902349641, 240.41535423461914]
bandE = 0.7435616030790153
def exp_fit(E, a, b, c):
# return a + c * E * np.sqrt(E - bandE) * np.exp(-E/0.046)# Eg and k are already defined previously
return a + c * E * np.sqrt(E-bandE) * np.exp(-E*b)
E = np.linspace(np.min(new_df['Photon Energy']), np.max(new_df['Photon Energy']),1000)
popt, pcov = curve_fit(exp_fit, new_df['Photon Energy'], new_df['S2c'],maxfev = 10000, p0=[0,500/23,1e+9]) # best guess of a,b, and c value
plt.plot(new_df['Photon Energy'], new_df['S2c'], 'o', label='S2c')
plt.plot(new_df['Photon Energy'], exp_fit(new_df['Photon Energy'], *popt), '-', label='S2c fit')
plt.ylabel('Emission Intensity (a.u.)')
plt.xlabel('Photon Energy (eV)')
plt.yscale('log')
plt.legend()
plt.show()
And this is what we end up getting.
out: [1.59739310e+00 2.50268369e+01 9.55186101e+11]
So after a long discussion with the person I am working with (we aren't that knowledgeable about python or data science), we agree that everything except for the a
coefficient fits really well (b doesnt really matter because it will be explicitly calculated at a later step. C matters alot and it appears to be of the right order of magnitude). Because it is a vertical shift, we expect a
to be a constant but the curve is diverging as a result of it.
As mentioned in the question title and the previous para, we are expecting a
to be about 5e-4
or within that range of magnitude but we are getting something that is way too large for this experiment. If anyone is proficient with the curve_fit feature of scipy, do help us out!
Additional info, we used to use something called OriginLab (a more expensive microsoft excel), but it is hella expensive for the license, so we are trying to use python instead. This method does work on OriginLab and does not result in a divergence in the fit, so we figured it might have something to do with the algorithm that curve_fit uses.
Apparently the trouble is due to a non convenient criteria of fitting.
LMSE (Least Mean Square Error) in probably implemented in your software. This is not a good choice of criteria of fitting in case of data extended on several decades.
LMSRE (Least Mean Square Relative Error) is recommended in the case of your data.
See below the comparison of results.
NOTE: The expected value of about a=0.0005 is absurd compared to the range of the data from 1. to 240. This would be with no effect, just like a=0. May be a muddle in scales or units ?