Search code examples
pythonstatisticsregressionpoissondummy-variable

Fixing 'TypeError' in Poisson Regression (using Python)


I'm running a Poisson Regression in Python, and it's throwing the following error:

TypeError: from_formula() takes at least 4 arguments (3 given)

How can I fix it? My code is as follows:

from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
    Independence,Autoregressive) 
from statsmodels.genmod.families import Poisson

# 'df' is the dataframe containing all the data
f1 = "net_unique_bids ~ city1 + city2 + city3 + city4 + item_category1 + item_category2 + item_category3 + item_condition1 + item_condition2 + item_condition3 + asking_price + description_char_count + num_of_photos" 
model1 = GEE.from_formula(formula=f1, data=df, cov_struct=Independence(), family=Poisson())

Background: I'm modeling bids recieved (dependent variable) in my auctions website with features such as city (categorical), item_category (categorical), asking_price (continuous), num_photos (continuous) etc.

My overall aim is to find which features have the highest influence on bids received. This way, I can focus my energies on improving features that matter the most.


Solution

  • From the documentation, syntax for the model definition should follow:

    def from_formula(cls, formula, groups, data, subset=None, 
                    time=None, offset=None, exposure=None, *args, **kwargs): 
    

    You did not specify groups, hence it throws an error.

    groups : array-like or string Array of grouping labels. If a string, this is the name of a variable in data that contains the grouping labels.

    Possibly you could try for no grouping using the index of your dataframe? :/ Otherwise, using the id of which city you are looking at to separate the data into four groups to regress on; that would require taking city1...city4 out of the formula though. I am not particularly clear on what would suit your need here..