I have a dataset (x,y) where x is a n-dimensional vector and y is an m-dimensional vector. (m=3, n>2) My goal is to find the best polynomial in x fitting the (x,y) dataset.
The dimension of x is pretty big (right now it is 25), and I don't want to enter manually all the possibilities (ie x1*x3*x5, x1*x4*x6, ...). I can use Matlab, Mathematica and R. How can I do this?
Also, I would be interested in hearing your suggestions about the following problem: how can I choose from the result the most relevant coefficients? (maybe x1*x2 is more relevant than x2*x3)
Thank you
This question is not really about any of the analysis platforms, but rather instead how to properly do multivariate analysis. As such it should be augmented with a description of the subject area. There also needs to be appropriate consideration of the implicit multiple testing that is occurring and what sort of penalization should be performed to avoid inflation of the inferential statistics. Bottom line: You should read Frank Harrell's "Regression Modeling Strategies", where each of these sentences get expanded into a full-length chapter. (I also think the question is overly broad and should be closed or migrated to stats.stackexchange.) It is not ready for prime-time coding.