python-3.x optimization solver minimization

Which scipy.optimize solver should I use for this data? (finding a module in a series)

I have a data, a sequence of integers (with repetition), multiplied by an unknown constant c, which I need to find. The data also has noise:

import pandas as pd
import numpy as np

#data
mySize=[1000,1]

#Unknown constant c to find with a solver
c=np.random.uniform(0.5,10)

#example data: df=c*[integer list]+noise
df = pd.DataFrame(c* np.random.randint(-500,500,size=mySize) +np.random.uniform(-0.8,0.5,mySize))\
                 .sort_values(by=0).reset_index(drop=True)
##Export to excel and open
#if True:
#    import os
#    myPath=os.path.join(os.environ['temp'], 'myNumbers.xlsx')
#    df.to_excel(myPath,sheet_name="Python data",engine='xlsxwriter')
#    os.startfile(myPath)
myNumbers=df[0].tolist()

#auxiliary calculation
absx_y=[ abs(x-y) for x in myNumbers for y in myNumbers]

Screenshot of data and Δdata
(in red is the delta (difference) between consecutive numbers. When the c*"integers" are equal, the difference is just the noise, plotted at the bottom, as small differences)

My idea is that, because c is the module separating most data, the function mod(data,c)≈0

Screenshot data % c

So, I need to minimize this loss function:

def Loss(trial_c):
    answer=np.sum( ((absx_y/trial_c-0.5) % 1 -0.5)**2 )
    #print("c="+str(c)+"; trial_c="+str(trial_c)+"; loss(trial_c)="+str(answer))
    return answer

Which is minimum for the values of c that turn the data into integers

screenshot Loss function

My intention is to use a solver, but to understand the problem, I did a brute force approach: If I generate all possible values for c, and his loss: (this is awfully slow)

#generate all trial c values for 
trialC=np.arange (1,1000, 1)*(df[0].diff().max()/1000)

#lossOfTrialC =[Loss(xx) for xx in trialC]# <- This is horribly slow, so I use parallel calculation     
from joblib import delayed, Parallel
lossOfTrialC = Parallel(n_jobs=8)(delayed(Loss)(xx) for xx in trialC)

When I plot it:

import matplotlib.pyplot as plt

def PlotearXY(X,Y,Title=""):

    #plt.ion()
    fig = plt.figure()
    fig.subplots_adjust(bottom=0.2)
    ax = plt.gca()
    #ax.scatter(X,Y,marker='o',s=1)#.abs()
    ax.plot(X,Y,marker='o')#.abs()
    #ax.set_yscale('log')
    plt.title(Title) 
    #plt.draw()
    plt.show()
    plt.close()


PlotearXY(trialC,lossOfTrialC,"objective c="+str(c))

I get a clear minimum for the loss function at the right trialC, but the loss is very noisy, full of local minimum

Screenshot Losses

I tried this method in excel, and it works. Because excel uses SLSQP, I tried scipy SLSQP solver:

from scipy.optimize import minimize

#Constraints
maxC=max(exampleData)
def constraint1(trial_c):
    return trial_c-maxC

#Initial value for trialC
trial_c=[df[0].diff().max()]

#Bounds for trial_c
myBounds=[(0.0000001,df[0].diff().max())]

#inequalities for trial_c (not sure if necessary)
con1 = {'type': 'ineq', 'fun': constraint1} 
cons = ([con1])

solution = minimize(Loss,trial_c,method='SLSQP',\
                    bounds=myBounds,constraints=cons)

But it generally fails, getting stuck into a local minimum.

The question is, "which solver should I use?"

The documentation of scipy.minimize has a large list of different solver, but I have no clue which one would be better for this problem.

Or my entire approach is wrong?

Solution

I haven’t looked too much into your implementation, however I have a couple of points and suggestions:

it seems to me that you are using the mod() function inside your objective. The mod() function is discontinuous and the optimizer (especially a gradient-based one like SLSQP) might struggle to find a proper descent direction.
SLSQP is a local optimization algorithm, it can only guarantee a local minimum. And I seem to remember that Excel uses GRG2, not SLSQP, but this is irrelevant to the question
you may want to consider the global optimization algorithms in SciPy, and in particular SHGO and DualAnnealing. You should also give NLOpt a try, it implements many good global optimization algorithms (DIRECT, CRS2, etc...).