Search code examples
pythonpandaslinear-regressionpolynomials

Linear Regression - re-scaled to not go over max value


I have a polynomial regression model that outputs the predicted values ('predicted_rev_running_total') in a data frame, which is supposed to be a running total along a project timeline that's from 0 to 1. I reordered the 'predicted_rev_running_total' from smallest to largest. My dilemma now is how to scale it so that is resembles something like the 'new_predicted_rev_running_total' column.

enter image description here


Solution

  • I think this is what you want. It's a simple two step process:

    1. create a normalized value of the predicted column (just divide by max per group)

    2. multiply the normalized value by the contract value

    # first create a normalized value of predicted column
    df['normalized_predicted'] = df.groupby("project_timeline")["predicted_rev_running_total"].apply(lambda x: x/x.max())
    
    # then, multiply it by the bill contract
    df['new_predicted_rev_running_total'] = df.apply(lambda row: (row['normalized_predicted']*row['Guaranteed Bill Contract Amt (Max Value)']), axis=1)