Vectorized liner model

Using lm() in R I can do the following

fit <- lm(organ_volumes~sex+genotype, data=factors)

where organ volumes is a matrix where each column is a different variable. Each column in turn is fit to a linear model as described in the lm docs:

If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.

Is there any way to do something similar in Python using statsmodels rather than having to loop over each column, which is much slower than the R method?

Solution

You can try the following in scikit, just note that sometimes for correlated dependent variables, the output is different from R:

from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data= iris['data'],
                     columns= iris['feature_names'] )

from sklearn import linear_model
clf = linear_model.LinearRegression()
X = df[['sepal length (cm)','sepal width (cm)']]
Y = df[['petal length (cm)','petal width (cm)']]
clf.fit(X,Y)
clf.coef_

array([[ 1.77559255, -1.33862329],
       [ 0.723292  , -0.47872132]])

In R:

data = as.matrix(iris[,-5])
lm(data[,c(1,3)] ~ data[,c(2,4)])

Call:
lm(formula = data[, c(1, 3)] ~ data[, c(2, 4)])

Coefficients:
                            Sepal.Length  Petal.Length
(Intercept)                  3.4573        2.2582     
data[, c(2, 4)]Sepal.Width   0.3991       -0.3550     
data[, c(2, 4)]Petal.Width   0.9721        2.1556