Search code examples
pythonpandasmachine-learningdata-analysis

Data Analysis with Python to Find Most Effective Column


I have an excel table with one result column, and about 8 9 column whice are affects the result. I know some machine leearning methods such as linear regression and polynomial reg, but not an expert.

Which method should I use to find out which column affects the result responsively?

My table have 3000 data (rows), 1 result col, and 9 Effective col


Solution

  • I recommend one of the following:

    Unsupervised Dimension Reduction

    Step 1 see if there is one or two obvious columns by using dimension reduction techniques such as PCA. After runnnig PCA you can look at the explained_variance_ratio_ to se how much of the variance is explained by each component. If you are lucky most of the variance is included in one or two directions. You can look at the singular_values_ to see which columns these correspond to.

    Supervised ML technique

    Simplest to use is XGBoost library (XGBRegressor or XGBClassifier depending on your task), train it and look at the feature_importance. This will directly tell you which columns were most used to create the classifier.