Search code examples
pythonnumpydatetimelinear-regression

Python: Linear regression from Pandas df - ordinal dates conversion


First time trying to forecast using basic linear regression in Python. Discovered I had to convert dates to ordinal dates then into a 2D numpy array. I now want to convert the numpy array back to YYYY/MMM/DD for a useable visual plot, but am failing. Never used numpy before, therefore x_full_month.map(dt.datetime.fromordinal) is not working, as does not seem to be valid in numpy.

    from sklearn.linear_model import LinearRegression
    model=LinearRegression()
    df['Date_Ordinal']=df['Date'].map(dt.datetime.toordinal)
    x=df['Date_Ordinal']
    y=df['Cost']
    x_train = x.values.reshape(-1, 1)
    y_train = y.values.reshape(-1, 1)
    y_pred = model.predict(x_train)

From the predictive model, I'm then creating a new X of ordinal dates for the full month, to get a full months response

    x_full_month = np.arange(737850,737880,1).reshape((-1, 1))
    y_pred_new = model.predict(x_new)
    print('predicted response:', y_pred.T, sep='\n')

This seems to work, however has an ordinal dated X (as expected), how would I get a nicely formatted X for plotting. Or get this back into a Pandas array, which I'm more familiar with? Or, am I completely going about this a roundabout way?

Edit: corrected parameter name


Solution

  • Several hours later and I have a solution. I'm still sure I'm going about this in-efficiently, but the steps below do work for me.

        # .flatten converts numpy arrays into pandas df columns
        df = pd.DataFrame(y_pred.flatten(),x_full_month.flatten())  
         
        # creates a new index (as pd.Dataframe made x_full_month the index initially)
        df.reset_index(inplace=True) 
        
        # meaningful column names
        df = df.rename(columns = {'index':'ord_date',0:'cumul_DN'}) 
        
        # Convert oridinal date to yyyy-mm-dd
        df['date']=df['ord_date'].map(dt.datetime.fromordinal)