Find x-value where y = 0 for extended regression

I have following dataset:

ID        DATE            VALUE
1         01-01-2020      98
1         02-01-2020      96
1         03-01-2020      94
2         01-01-2020      99
2         02-01-2020      90
2         05-01-2020      85

For each id I want to calculate the date where the value will reach 0 through linear regression. I succeeded in plotting an extended line, where I can visually through following code see the point where it reaches 0, but I don't succeed in extracting the exact date.

For now my code looks like this:

# convert date to ordinal 
df['DATE_ord']=df['DATE'].map(dt.datetime.toordinal)

# Create plot with current values
ax = sns.regplot(data=df,x='DATE_ord',y='VALUE')

# Set x-axis values (ordinal dates)
date_range = np.arange(730000, 760000, 1000)

# Calculate extended values
slope, intercept, r_value, p_value, std_err = stats.linregress(x = df[
        "DATE_ord"].values, y = df["VALUE"].values)

line = [slope*xi + intercept for xi in date_range]

plt.plot(date_range, line, label="Fitting Line", linewidth=1)

# Convert x-axis values back to date
ax.set_xlabel('date')
new_labels = [date.fromordinal(int(item)) for item in ax.get_xticks()]
ax.set_xticklabels(new_labels)

I now need to find the date where the Y-value (VALUE) reaches 0. I found this post which more or less wants to achieve the same, but this doesn't work because the values in my slope are a different number as the predicted values for my x-axis, and I'm at a loss on how to make this work. The only thing I can think of is to calculate it with the regression expression and to fill in the numbers, but I'm not sure how to do this in a way that would work in a for-loop which I still need to create to loop through the different ID's.

Solution

Let's try to do this with pandas only, without looping

Load sample data

import pandas as pd
from io import StringIO
data = StringIO(
'''
ID        DATE            VALUE
1         01-01-2020      98
1         02-01-2020      96
1         03-01-2020      94
2         01-01-2020      99
2         02-01-2020      90
2         05-01-2020      85
''')
df = pd.read_csv(data, sep = '\s+')
df['DATE'] = pd.to_datetime(df['DATE'], dayfirst = True)

generate ordinals and calculate regression coefficients per group:

import datetime as dt
from scipy import stats
df['DATE_ord']=df['DATE'].map(dt.datetime.toordinal)
dfr = (df.groupby('ID').apply(lambda g: stats.linregress(x = g[
        "DATE_ord"].values, y = g["VALUE"].values)[0:2])
  .apply(pd.Series)
  .rename(columns = {0 : 'slope', 1:'intercept'}))
dfr

we get this


     slope      intercept
ID      
1   -2.000000   1.474948e+06
2   -3.076923   2.269096e+06

Now the 'zero' date is just as a solution to 0 = s * x + i which is x0 = -intercept/slope. we calculate that in ordinals and convert back to a date

dfr['zd_ord'] = -dfr['intercept']/dfr['slope']
dfr['zd'] = dfr['zd_ord'].astype(int).map(dt.datetime.fromordinal)

to get


     slope      intercept       zd_ord      zd
ID              
1   -2.000000   1.474948e+06    737474.00   2020-02-19
2   -3.076923   2.269096e+06    737456.35   2020-02-01