I have an excel table with one result column, and about 8 9 column whice are affects the result. I know some machine leearning methods such as linear regression and polynomial reg, but not an expert.
Which method should I use to find out which column affects the result responsively?
My table have 3000 data (rows), 1 result col, and 9 Effective col
I recommend one of the following:
Step 1 see if there is one or two obvious columns by using dimension reduction techniques such as PCA. After runnnig PCA you can look at the explained_variance_ratio_
to se how much of the variance is explained by each component. If you are lucky most of the variance is included in one or two directions. You can look at the singular_values_
to see which columns these correspond to.
Simplest to use is XGBoost library (XGBRegressor or XGBClassifier depending on your task), train it and look at the feature_importance. This will directly tell you which columns were most used to create the classifier.