Search code examples
pythonfinanceestimationquantitative-financeeconomics

Multiple OLS estimation TypeError


I am trying to do some Newey-West OLS with statsmodels on my data to estimate my parameters, and the following is my code for doing so:

from __future__ import print_function, division 
import xlrd as xl
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm

file_loc = "/Python/dataset_3.xlsx"
workbook = xl.open_workbook(file_loc)
sheet = workbook.sheet_by_index(0)
tot = sheet.nrows
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in 

range(sheet.nrows)]

rv1 = []
rv5 = []
rv22 = []
rv1fcast = []
T = []
price = []
time = []
retnor = []

for i in range(1, tot):        
    t = data[i][0]
    ret = data[i][1]
    ret5 = data[i][2]
    ret22 = data[i][3]
    ret1_1 = data[i][4]
    retn = data[i][5]
    t = xl.xldate_as_tuple(t, 0)
    rv1.append(ret)
    rv5.append(ret5)
    rv22.append(ret22)
    rv1fcast.append(ret1_1)
    retnor.append(retn)
    T.append(t)

df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RV1.notnull()]
model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)

Everything looks just fine when I look at the arrays or my dataframe, but it returns just: TypeError: init() takes at least 2 arguments (1 given)

I have tried a bunch of different methods and I cannot see what I am missing.

When i run it the following errormessage shows:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Python/harrv.py in <module>()
     41 df = df[df.RV1.notnull()]
     42 
---> 43 model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)
     44 
     45 #mdl = model.get_robustcov_results(cov_type='HAC',maxlags=1)

TypeError: __init__() takes at least 2 arguments (1 given) 

printing rv1 gives you:

Out[318]: 
[0.015538008996147568,
 0.008881670570720125,
 0.010421778063375802,    
.....    
 0.003151044550868834,
 0.0029676428110974166,
 0.005236329928710288,
 0.004838460533164701,
 '']

And the other rv gives similair floating numbers. The df just assembles them in the manner that pd.dataframe does, which according to the documentation is supported (http://statsmodels.sourceforge.net/devel/example_formulas.html).


Solution

  • The problem is that the formula function in statsmodels.formula.api is lower case. Upper case OLS is the same as in the main statsmodels.api. The uppercase models will be deleted in future from the formula.api namespace to avoid exactly this confusion.

    That means, you need to use lower case ols, as in

    model = smf.ols(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)

    Note, the lower case formula functions are just aliases to the from_formula methods of the models.

    smf.ols is a shortcut for sm.OLS.from_formula