Search code examples
pythonscipycurve-fittingscipy-optimizemultivariate-testing

Optimizing variable applied to discrete data in order to minimize error function


I'm trying to optimize a function based on discrete data sets I got in the lab.

Spectrum_1 and Spectrum_2 are experimental data sets with length N. Those two arrays contain values taken for each lambda (wavelength) value in lambdas array.

So:

len(Spectrum_1) == len(Spectrum_2) == len(lambdas)

Each Spectrum_1 and Spectrum_2 have the symbolic form

Spectrum = G(T,lambda) * E(lambda)

E(lambda) is unknown, but it is known to not vary with T. E(lambda) would be discrete data (a set of points defined on each value in the 'lambdas' array)

G(T, lambda) is a known function of T and wavelength (lambda) (it is a continuous and defined function). It is the Planck blackbody radiation equation in regards to wavelength to be more specific.

E_1 and E_2 should be equal:

E_1 = Spectrum_1/np.array([G(T_1, lamda) for lamda in lambdas])
E_2 = Spectrum_2/np.array([G(T_2, lamda) for lamda in lambdas])

T_1 and T_2 are unknown but I know they are within 400 to 1000 range (both). T_1 and T_2 are two scalar values.

Knowing that, I need to minimize:

np.sum(np.array([(E_1[i] - E_2[i])**2 for i in lamdas]))

Or at least that's what I think I should minimize. Ideally (E_1[i]-E_2[i])==0 but that won't be the case granted the experimental data in Spectrum_1 and _2 contain noise and distortions due to atmospheric transmission.

I'm not very familiar with optimizing with multiple unknown variables (T_1 and T_2) in Python. I could brute test millions of combinations of T_1 and T_2 I suppose, but I wish to do it correctly. I wonder if anybody could help me.

I hear scipy.optimize could do it for me, but many methods ask for Jacobian and Hessian and I'm unsure how to proceed given I have experimental data (Spectrum_1 and Spectrum_2) and I'm not dealing with continuous/smooth functions.


Solution

  • Given the data and assumptions, I see no reason why scipy.optimize.minimize() wouldn't work.

    I hear scipy.optimize could do it for me, but many methods ask for Jacobian and Hessian

    Those aren't required. If you don't provide them, they will be estimated using numerical differentiation, i.e. varying the inputs a little and seeing how much the outputs change. This assumes the function is continuous, though.

    I'm unsure how to proceed given I have experimental data (Spectrum_1 and Spectrum_2) and I'm not dealing with continuous/smooth functions.

    As I understand it, you do have a continuous function. You've said that G(T, lambda) is a continuous function. The fact that you're applying it to many different data samples doesn't make it no longer continuous. Similarly, the other operators, -, squaring, /, and summing, also are continuous. (With the exception of division by zero - but you can write a constraint to forbid this if G(T, lambda) can be zero.)

    You could try something along these lines:

    from scipy.optimize import minimize
    
    
    # Put experimental data here
    Spectrum_1 = np.array([...])
    Spectrum_2 = np.array([...])
    lambdas = np.array([...])
    
    def loss(params):
        T_1, T_2 = params
        E_1 = Spectrum_1/np.array([G(T_1, lamda) for lamda in lambdas])
        E_2 = Spectrum_2/np.array([G(T_2, lamda) for lamda in lambdas])
        # Use vectorization to subtract every element of E_1 from E_2
        difference = E_1 - E_2
        return np.sum(difference**2)
    
    result = minimize(
        loss,
        x0=[0, 0],  # Put some reasonable initial guess here for T_1 and T_2
    )
    print(result)
    

    If you think this is a function which may have many local minima, you may also want to try one of the global optimizers. Also, all of the global optimizers except basinhopping work even when the function is not continuous.