I have a very basic question, which I can somehow not find a real answer for.
Assuming I have a model:
import statsmodels.formula.api as smf
model = smf.ols(....).fit()
What is the difference between model.fittedvalues
and model.predict
?
model.predict
is a method for predicting values, so you can provide it an unseen dataset:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,2),columns=['X','Y'])
model = smf.ols('Y ~ X',data=df).fit()
model.predict(exog=pd.DataFrame({'X':[1,2,3]}))
If you do not provide the exog argument, it returns the prediction by calling the data stored under the object, you see this under the source code:
def predict(self, params, exog=None):
"""
Return linear predicted values from a design matrix.
Parameters
----------
params : array_like
Parameters of a linear model.
exog : array_like, optional
Design / exogenous data. Model exog is used if None.
Returns
-------
array_like
An array of fitted values.
Notes
-----
If the model has not yet been fit, params is not optional.
"""
# JP: this does not look correct for GLMAR
# SS: it needs its own predict method
if exog is None:
exog = self.exog
return np.dot(exog, params)
On the other hand, model.fittedvalues
is a property and it is the fitted values that are stored. It will be exactly the same as model.predict() for reasons explain above.
You can look at the methods for this type too.