Search code examples
pythonrlinear-regressionstatsmodels

Vectorized liner model


Using lm() in R I can do the following

fit <- lm(organ_volumes~sex+genotype, data=factors)

where organ volumes is a matrix where each column is a different variable. Each column in turn is fit to a linear model as described in the lm docs:

If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.

Is there any way to do something similar in Python using statsmodels rather than having to loop over each column, which is much slower than the R method?


Solution

  • You can try the following in scikit, just note that sometimes for correlated dependent variables, the output is different from R:

    from sklearn.datasets import load_iris
    iris = load_iris()
    df = pd.DataFrame(data= iris['data'],
                         columns= iris['feature_names'] )
    
    from sklearn import linear_model
    clf = linear_model.LinearRegression()
    X = df[['sepal length (cm)','sepal width (cm)']]
    Y = df[['petal length (cm)','petal width (cm)']]
    clf.fit(X,Y)
    clf.coef_
    
    array([[ 1.77559255, -1.33862329],
           [ 0.723292  , -0.47872132]])
    

    In R:

    data = as.matrix(iris[,-5])
    lm(data[,c(1,3)] ~ data[,c(2,4)])
    
    Call:
    lm(formula = data[, c(1, 3)] ~ data[, c(2, 4)])
    
    Coefficients:
                                Sepal.Length  Petal.Length
    (Intercept)                  3.4573        2.2582     
    data[, c(2, 4)]Sepal.Width   0.3991       -0.3550     
    data[, c(2, 4)]Petal.Width   0.9721        2.1556