Search code examples
pythonscipycurve-fittingscipy-optimize

scipy curve_fit coefficient does not align with expected value (physics relevant?)


I am currently processing experimental data for my thesis and am running into a problem with scipy curve_fit.

Background

This is a study of LED emission with the following model depicting the absorption spectra for a specific LED composition/wavelength.

The model is this:

equation for model

The basic idea is, we got experimental data and we want to fit this equation to give us a best guess of a vertical shift in the data that is a result of the equipment used in the experiment. And to get that vertical shift, the function to be used in the curve_fit would take the form of a + c * E * np.sqrt(E-bandE) * np.exp(-E*b). bandE/Eg refers to the bandgap energy of the material which will be provided in the code section. E refers to the photon energy.

What I did

The values I am using in a pandas dataframe that I kept as a list for you to copy and paste (if you want it),

photon_energy = [1.1271378805005456, 1.1169834851807208, 1.1070104183487501, 1.0972138659739825, 1.0875891829391229, 1.0781318856961741, 1.0688376453022415, 1.0597022808124787, 1.0507217530089832, 1.0418921584458825, 1.0332097237921667, 1.0246708004550413, 1.016271859467705, 1.0080094866265041, 0.9998803778633872, 0.9918813348404801, 0.9840092607544446, 0.9762611563390552, 0.9686341160551564, 0.9611253244578295, 0.9537320527312309, 0.9464516553821375, 0.939281567083788, 0.9322192996621053, 0.9252624392168658, 0.918408643370815, 0.9116556386401471, 0.9050012179201461, 0.898443238080145, 0.8919796176623023, 0.885608334679, 0.8793274245039717, 0.8731349778525352, 0.8670291388465735, 0.8610081031601389, 0.8550701162417932, 0.8492134716100002, 0.8434365092180953, 0.8377376138855407, 0.8321152137923491, 0.8265677790337335]
s2c = 1.0711371944297785, 1.0231329828975677, 1.0994106908895496, 1.5121380434280387, 1.4362625879245816, 1.6793735384201034, 1.967376254925342, 2.718958670464331, 2.8657461347457933, 3.2265806746948247, 4.073118384895329, 5.002080377098846, 5.518310980392261, 6.779117609004787, 7.923629188601875, 9.543272102194026, 11.061716095291905, 12.837722885549315, 15.156654004011116, 17.604461138085984, 20.853321055852934, 24.79640344112394, 28.59835938028905, 32.5257456, 37.87676923906976, 42.15321400245093, 46.794297771521705, 56.44267690099888, 61.60473904566305, 70.99822229568558, 77.60736232076566, 84.37513036736146, 92.9038746946938, 107.54475674330527, 117.91910226690293, 137.67481655050688, 158.02001455302846, 176.37334256204952, 195.20886164268876, 215.87011902349641, 240.41535423461914]

The fit

bandE = 0.7435616030790153
def exp_fit(E, a, b, c): 
    # return  a + c * E * np.sqrt(E - bandE) * np.exp(-E/0.046)# Eg and k are already defined previously 
    return a + c  * E * np.sqrt(E-bandE) * np.exp(-E*b)

E = np.linspace(np.min(new_df['Photon Energy']), np.max(new_df['Photon Energy']),1000)

popt, pcov = curve_fit(exp_fit, new_df['Photon Energy'], new_df['S2c'],maxfev = 10000, p0=[0,500/23,1e+9]) # best guess of a,b, and c value
plt.plot(new_df['Photon Energy'], new_df['S2c'], 'o', label='S2c')
plt.plot(new_df['Photon Energy'], exp_fit(new_df['Photon Energy'], *popt), '-', label='S2c fit')
plt.ylabel('Emission Intensity (a.u.)')
plt.xlabel('Photon Energy (eV)')
plt.yscale('log')
plt.legend()
plt.show()

And this is what we end up getting.

my fit

out: [1.59739310e+00 2.50268369e+01 9.55186101e+11]

So after a long discussion with the person I am working with (we aren't that knowledgeable about python or data science), we agree that everything except for the a coefficient fits really well (b doesnt really matter because it will be explicitly calculated at a later step. C matters alot and it appears to be of the right order of magnitude). Because it is a vertical shift, we expect a to be a constant but the curve is diverging as a result of it.

The problem

As mentioned in the question title and the previous para, we are expecting a to be about 5e-4 or within that range of magnitude but we are getting something that is way too large for this experiment. If anyone is proficient with the curve_fit feature of scipy, do help us out!

Additional info, we used to use something called OriginLab (a more expensive microsoft excel), but it is hella expensive for the license, so we are trying to use python instead. This method does work on OriginLab and does not result in a divergence in the fit, so we figured it might have something to do with the algorithm that curve_fit uses.


Solution

  • Apparently the trouble is due to a non convenient criteria of fitting.

    LMSE (Least Mean Square Error) in probably implemented in your software. This is not a good choice of criteria of fitting in case of data extended on several decades.

    LMSRE (Least Mean Square Relative Error) is recommended in the case of your data.

    See below the comparison of results.

    enter image description here

    NOTE: The expected value of about a=0.0005 is absurd compared to the range of the data from 1. to 240. This would be with no effect, just like a=0. May be a muddle in scales or units ?