Search code examples
arrayspython-3.xseriesminimizescipy-optimize

Linear Combination: Determine scalars for four series (spectra) to fit known spectrum


I have four "principal" spectra that I want to find coefficients/scalars for to best fit my data. The goal is to know how much of principal x is in the data. I am trying to get the "percent composition" of each principal spectrum to the overall spectrum (I.e. 50% a1, 25% a2, 20% a3, 5% a4.)

    #spec = spectrum, a1,a2,a3,a4 = principal components these are all nx1 dimensional arrays
    c = 0 #some scalar
    d = 0 #some scalar
    e = 0 #some scalar
    g = 0 #some scalar
def f(spec, c, d, e, g):
    y = spec - (a1.multiply(c) - a2.multiply(d) - a3.multiply(e)- a4.multiply(g))
    return np.dot(y, y)
    res = optimize.minimize(f, spec, args=(c,d,e,g), method='COBYLA', options={'rhobeg': 1.0, 'maxiter': 1000, 'disp': False, 'catol': 0.0002}) #z[0], z[1], z[2], z[3]
    best = res['x']

The issue I'm having is that it doesn't seem to give me the scalar values (c,d,e,g) but instead another nx1 dimensional array. Any help greatly appreciated. Also open to other minimize/fit techniques.


Solution

  • After some work, I found two methods that give similar results for this problem.

    mport numpy as np
    import pandas as pd
    import csv
    import os
    from scipy import optimize
    
    path = '[insert path]'
    os.chdir(path)
    data = 'data.csv' #original spectra
    factors = 'factors.csv' #factor spectra
    nfn = 'weights.csv' #new filename
    df_data = pd.read_csv(data, header = 0) # read in the spectrum file
    df_factors = pd.read_csv(factors, header = 0)
    # this array a is going to be our factors
    a = df_factors[['0','1','2','3']
    

    Need to seperate the factor spectra from the original data frame.

    a1 = pd.Series(a['0']) 
    a2 = pd.Series(a['1'])
    a3 = pd.Series(a['2'])
    a4 = pd.Series(a['3'])
    b = df_data[['0.75M']] # original spectrum!
    b = pd.Series(b['0.75M']) # needs to be in a series
    

    x0 is my initial guess for my coefficient

    x0 = np.array([0., 0., 0.,0.])
    def f(c):
        return b -((c[0]*a1)+(c[1]*a2)+(c[2]*a3)+(c[3]*a4))
    

    using least squares from Scipy optimize least squares and then later with minimize from the same package, both work, minimize is slightly better IMO.

    res = optimize.least_squares(f, x0, bounds = (0, np.inf))
    xbest = res.x
    
    x0 = np.array([0., 0., 0., 0.])
    def f(c):
        y = b -((c[0]*a1)+(c[1]*a2)+(c[2]*a3)+(c[3]*a4))
        return np.dot(y,y)
    
    res = optimize.minimize(f, x0, bounds = ((0,np.inf),(0,np.inf),(0,np.inf),(0,np.inf)))