Search code examples
rstatsmodels

Programming a linear regression R model formula for 100 features to have an interaction with one


I have a situation where I need to train a regression model that will have 100 features. I want to look for interaction effects between all 100 features and one other feature. I would like to find a way to do this programatically as well since this analysis is going to be recuring and I don't want to have to reprogram a new formula each time this analysis is run. I want it to be automated. So how can I get a model that is like so

Y~a*b + a*c + .... a*z 

But for 100 terms? How do I get the R formula to do this? Note I will be using statsmodels in python but I think the syntax is the same.


Solution

  • lm(Y ~ a * ., df)
    

    eg

    lm(Sepal.Width ~ Sepal.Length * ., iris)
    
    Call:
    lm(formula = Sepal.Width ~ Sepal.Length * ., data = iris)
    
    Coefficients:
                       (Intercept)                    Sepal.Length                    Petal.Length                     Petal.Width  
                          -0.91350                         0.82954                         0.29569                         0.85334  
                 Speciesversicolor                Speciesvirginica       Sepal.Length:Petal.Length        Sepal.Length:Petal.Width  
                           0.05894                        -0.89244                        -0.05394                        -0.04654  
    Sepal.Length:Speciesversicolor   Sepal.Length:Speciesvirginica  
                          -0.32823                        -0.21910