Search code examples
pythonlogistic-regressionoffsetstatsmodels

How to set an offset to multiple variables at the same in Statsmodels Logit model?


I'm trying to train a logit model using statsmodels.discrete.discrete_model.Logit where the coefficients for some variables are already known, but need to be calculated for others. I'm able to get the code working for offsetting only one variable, but haven't been able to figure out how to do it for several variables at the same time.

This works for a single variable offset:

import numpy as np
import pandas as pd
import statsmodels.discrete.discrete_model as smdm

df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
labels = np.random.randint(2, size=8)

known = 0.2

model_train = smdm.Logit(labels, df[['y', 'x', 'a']], offset=known*df['z']).fit()

But this doesn't work for multiple offsets:

import numpy as np
import pandas as pd
import statsmodels.discrete.discrete_model as smdm

df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
labels = np.random.randint(2, size=8)

known = [0.2, 0.1]

model_train = smdm.Logit(labels, df[['y', 'x']], offset=known*df[['z', 'a']]).fit()

It produces the following error:

ValueError: Unable to coerce to Series, length must be 2: given 8

I have tried several different ways to set the offset variable, for example offset=[0.2df['z'], 0.1df['a']] but I keep getting an exception.


Solution

  • Thanks to the comments from @Josef I was able to get it working. The code is as follows:

    import numpy as np
    import pandas as pd
    import statsmodels.discrete.discrete_model as smdm
    
    df = pd.DataFrame(np.random.randn(8, 4), columns=list('yxza'))
    known = 0.2 * df['z'] + 0.1 * df['a']
    
    model_train = smdm.Logit(labels, df[['y', 'x']], offset=known).fit()