I have many datasets which start at a higher y-axis value, then decrease. This happens in traditional S-curve fashion, so it actually looks like a reverse sigmoid. I successfully flipped the y-data so that the maximum value is at the bottom in the y-axis, and it decreases upward. So, data plots fine. The curve now looks like a standard sigmoid.
The Big Issue: I cannot fit the sigmoid to it. I have done everything I can think of for a few days.
The best result I have is in the attached image. I have many fits that just look incomplete and nowhere close to what a sigmoid fit should be.
Example of best result it plots: (https://i.sstatic.net/NPVry.png)
The code I have:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# 1. Data set:
x_data = np.array([130, 125, 120, 115, 110, 105, 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40,
35, 30, 25])
y_data = np.array([81, 81, 80.5, 79, 77.5, 76, 74.5, 71.5, 68.5, 67, 64, 59.5, 55, 48.5, 44, 42.5, 42,
41.5, 41.5, 41, 41, 41])
# 2. Reverse the y-values:
y_data_reversed = np.flip(y_data)
y_range = np.linspace(41, 61.75, 81)
# 3. Sigmoid function defined:
def sigmoid(x, U, k, a, b):
b = 81 #-the maxmimum y_data value.
return (U / (1 + np.exp(-k * (x - 77.5))))**a + b
p0 = [np.max(y_data), np.median(x_data), 1, np.min(y_data)] #-as an initial guess for the fit
calculation.
popt, pcov = curve_fit(sigmoid, x_data, y_data, p0, maxfev = 10000) #method = 'lm')
# 4. Sigmoid fit to data:
x_fit = np.linspace(25, 77.5, 130) #-min(x_data), np.median(x_data), max(x_data)).
y_fit = sigmoid(x_fit, *popt)
# 5. Plot data and sigmoid fit:
plt.plot(x_data, y_data_reversed, marker='o', linestyle='', color='b', label='Data')
plt.plot(x_fit, y_fit, color='r', label='Sigmoid Fit')
plt.xlabel('Temperature (°C)')
plt.ylabel('Length (mm)')
plt.title('Sigmoid Fit for Contraction Data')
plt.gca().invert_yaxis() # ...Invert y-axis to start with highest value at the top
plt.legend()
plt.grid(True)
plt.show()
In general, making an exponent a decision-variable, as in the a
argument in your sigmoid function, is dangerous because you can easily cause numerical over/underflow or even generate complex numbers depending on what values the optimizer chooses for the parameters.
Your b
parameter is redundant because you set b
to a fixed value inside the sigmoid function. Was that intentional?
Here's a simplified model I fitted to your data without the a
parameter.
I didn't flip the y-data and the y-axis of the plot as it is just a source of confusion. Instead, I put a minus sign inside the sigmoid function.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# 1. Data set:
x_data = np.array([
130, 125, 120, 115, 110, 105, 100, 95, 90, 85,
80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25
])
y_data = np.array([
81, 81, 80.5, 79, 77.5, 76, 74.5, 71.5, 68.5, 67,
64, 59.5, 55, 48.5, 44, 42.5, 42, 41.5, 41.5, 41, 41, 41
])
# 3. Sigmoid function defined:
def sigmoid(x, U, k, b):
a = 1
return -(U / (1 + np.exp(-k * (x - 77.5))))**a + b
p0 = [40, -0.1, 81]
x_fit = np.linspace(25, 130, 101)
y_init = sigmoid(x_fit, *p0)
popt, pcov = curve_fit(sigmoid, x_data, y_data, p0, maxfev = 10000)
y_est = sigmoid(x_data, *popt)
mse = np.mean((y_data - y_est) ** 2)
print(f"Mean-squared error: {mse:.3f}")
# 4. Sigmoid fit to data:
y_fit = sigmoid(x_fit, *popt)
# 5. Plot data and sigmoid fit:
plt.plot(x_data, y_data, marker='o', linestyle='', color='b', label='Data')
plt.plot(x_fit, y_init, color='r', label='Initial guess')
plt.plot(x_fit, y_fit, color='g', label='Sigmoid Fit')
plt.xlabel('Temperature (°C)')
plt.ylabel('Length (mm)')
plt.title('Sigmoid Fit for Contraction Data')
plt.legend()
plt.grid(True)
plt.show()
Output:
Mean-squared error: 1.679
Additional Result
You can try including the a
parameter and see what happens. I managed to get a good fit with the following 5-parameter model:
def sigmoid(x, U, k, a, b, x0):
return -((U / (1 + np.exp(-k * (x - x0))))**a) + b
p0 = [40, -0.1, 1.0, 81, 77.5]
Output:
Mean-squared error: 0.227
But use some caution: In this case
U
was equal to 4.59693084e+19 and a
was 85.8494729 !!! This model will probably not behave as expected when extrapolating outside the region of the data you have...
A better strategy is probably to set the a
parameter manually as in the previous code and run the optimization a few times with different a
values until you get a reasonable solution.