Search code examples
pythonstatsmodels

Difference between predict and fittedvalue in statsmodel


I have a very basic question, which I can somehow not find a real answer for.

Assuming I have a model:

import statsmodels.formula.api as smf
model = smf.ols(....).fit()

What is the difference between model.fittedvalues and model.predict ?


Solution

  • model.predict is a method for predicting values, so you can provide it an unseen dataset:

    import statsmodels.formula.api as smf
    import pandas as pd
    import numpy as np
    df = pd.DataFrame(np.random.randn(100,2),columns=['X','Y'])
    
    model = smf.ols('Y ~ X',data=df).fit()
    
    model.predict(exog=pd.DataFrame({'X':[1,2,3]}))
    

    If you do not provide the exog argument, it returns the prediction by calling the data stored under the object, you see this under the source code:

    def predict(self, params, exog=None):
            """
            Return linear predicted values from a design matrix.
    
            Parameters
            ----------
            params : array_like
                Parameters of a linear model.
            exog : array_like, optional
                Design / exogenous data. Model exog is used if None.
    
            Returns
            -------
            array_like
                An array of fitted values.
    
            Notes
            -----
            If the model has not yet been fit, params is not optional.
            """
            # JP: this does not look correct for GLMAR
            # SS: it needs its own predict method
    
            if exog is None:
                exog = self.exog
    
            return np.dot(exog, params)
    

    On the other hand, model.fittedvalues is a property and it is the fitted values that are stored. It will be exactly the same as model.predict() for reasons explain above.

    You can look at the methods for this type too.