Search code examples
pythonstatsmodels

ndim Error with Statsmodels Tweedie Model


I'm trying to run a tweedie model with Statsmodel and keep getting the following error:

AttributeError: 'Tweedie' object has no attribute 'ndim'

formula = 'pure_premium ~ atfault_model + channel_model_DIR + channel_model_IA + CLded_model + credit_model_52778 + \
        credit_model_c6 + package_model_Elite + package_model_LBO + package_model_Plus + package_model_Savers + \
        package_model_Savers_Plus + Q("ds_fp_paid_in_full_eligiable-has discount") + ds_fp_paid_in_full_ineligable + \
        Q("ds_pn_prior_insurance_eligable-has discount") + ds_pn_prior_insurance_ineligable + \
        Q("ds_ip_advanced_purchase_eligiable-has discount") + ds_ip_advanced_purchase_ineligable + \
        credit_model_c5 + ds_ad_affinity + ds_ak_alliance + \
        ds_ly_loyalty_discount + ds_mo_multipolicy + ds_pf_performance + majorvio_model + \
        (driver_age_model*marital_status_model) + minorvio_model + multi_unit_model + \
        RATING_CLASS_CODE_MODEL + unit_drv_exp_model +  Vintiles + safety_course_model + instructor_course_model + \
        (class_model*v_age_model) + (class_model*cc_model) + state_model'

lost_cost_model = smf.ols(formula = formula, data = coll_df
                          , family = sm.families.Tweedie(link = sm.families.links.log, var_power = 1.5))

Every variable is either a categorical, float or int.

I'm not sure what is causing this.


Solution

  • ols does not take a family, OLS is just linear regression.

    You need to use the generalized linear model, i.e. GLM or glm for the formula interface. GLM includes several families in the one parameter exponential family and includes a selection of link functions.

    Several other models are equivalent to GLM but based on a different implementation and with other options. Those models are written for the specific family-link combinations and do not have an option to change those.

    OLS is GLM with Gaussian family and linear link
    Logit is GLM with Binomial family, logit link and only for binary response variables.
    Proit is GLM with Binomial family, probit link and only for binary response variables.
    Poisson is GLM with a Poisson family and log link
    NegativeBinomial is a more general version of GLM with NegativeBinomial family and log link. discrete.NegativeBinomial allow for several parameterizations of the implied variance function and estimates the dispersion parameter jointly with the mean parameters as MLE.