I am trying to build a custom sigmoid-shaped function because I want to scale my data during preprocessing. Basically, the goal is to obtain a sigmoid shaped function that outputs from 0 to 1 and only takes positive input values (it approaches 0 as input approaches 0, and 1 if input approaches +infinity). The key point is that I want to be able to choose the inflection points of the 'S' shape at will. I have a little sketch here (forgive my paint skills).
The points I want to choose are marked as A and B, and ideally they are somewhere midway through the curve that connects the linear part of the function to the asymptotes.
Here is how I did it so far; I tried to fit a classic logistic function to two points.
Here is the function =:
def sigmoid(x,x0, k):
y = 1 / (1 + np.exp(-k * (x - x0)))
return y
And here the fit:
ydata = [0.1, 0.9]
xdata = [0.22, 1.34]
p0 = [np.median(xdata), 1] # this is a mandatory initial guess
from scipy.optimize import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0=p0, method='dogbox')
Here xdata corresponds to the A and B points on the x axis (which I want to be able to vary), and ydata are arbitrary points I want to map A and B to, in order to have them roughly at the inflection points of the S curve ( I don't know if there's a better way to do it, perhaps).
Then, the plot:
x = np.linspace(0, 5, 1000)
y = sigmoid(x,*popt)
plt.figure()
plt.plot(xdata, ydata, 'o', label='10th/90th percentiles')
plt.plot(x, y, label='sigmoid curve')
plt.ylim(0, 1.3)
plt.legend(loc='best')
plt.show()
yields the figure : (ignore the percentiles label in the legend, those are my A/b points)
Which is not a great shape. Especially towards 0, the transition is not nearly as smooth and gradual. I would like to basically shift the function to the right, to have a smoother curve, while still intercepting my A, B points at the inflection points. Do you have any suggestions on how I would achieve that? Adding a shift to the definition of the sigmoid function wouldn't work as the offset would just be overwritten by the curve fit. Is there a smarter way to solve this problem than my approach, which I can't see?
The problem with your model is its symmetry. Your dataset does require a strong asymmetric sigmoid to fit.
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize, special, stats
Lets add two almost innocuous points to your data, in order to have enough points for fitting at least two parameters and ensure the curve pass by the origin.
ydata = [0, 0.1, 0.9, 1]
xdata = [0, 0.22, 1.34, 5]
p0 = [np.median(xdata), 1.] # this is a mandatory initial guess
sigma=[0.1, 0.01, 0.01, 100]
We also add weights (sigmas) in order to give priority to your points and the origin.
Now we can compare you model (which is symmetric):
def model1(x, k, x0):
return special.expit(k * (x - x0))
popt1, pcov1 = optimize.curve_fit(model1, xdata, ydata, p0=p0, sigma=sigma)
# array([3.92654466, 0.78030023]
With two asymmetric sigmoids (CDF of asymmetric distributions are good candidate for that). We respectively chose: Weibull and Log Normal distributions:
def model2(x, c, loc):
return stats.invweibull(c=c, loc=loc).cdf(x)
popt2, pcov2 = optimize.curve_fit(model2, xdata, ydata, p0=p0, sigma=sigma)
# array([ 3.48553148, -0.56719092]
def model3(x, s, loc):
return stats.lognorm(s=s, loc=loc).cdf(x)
popt3, pcov3 = optimize.curve_fit(model3, xdata, ydata, p0=p0, sigma=sigma)
# array([ 0.41684656, -0.36610887]
It renders as follow: