Search code examples
pythonmachine-learningscikit-learndata-miningdimensionality-reduction

How does the SelectPercentile score function work?


Recently I am studying Dimension Reduction methods and I found that the python package "sklearn.feature_selection" seems pretty useful, but the problem is the method SelectPercentile.fit doesn't explain how it calculates the score function.

link

Does anyone know how it works?

For instance, if I select "SelectFdr" for "SelectPercentile", and the SelectFdr method's criterion is depending on each p-value of each feature. How do I know which ways does "SelectFdr" sets hypothesis or defines error rates?

SelectFdr method which subscription is "Select features based on an estimated false discovery rate." So it must be using some classification methods at first, so it can calculate the false discovery rate, my problem is what classification method is in "SelectPercentile".


Solution

  • you can see comment of source code in below link: https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/feature_selection/univariate_selection.py#L368

    you can select score function as parameter. if you don't determine function , default function is ANOVA.