I implemented BPSO as a feature selection approach using the pyswarms library. I followed this tutorial.
Is there a way to limit the maximum number of features? If not, are there other particle swarm (or genetic/simulated annealing) python-implementations that have this functionality?
An easy way is to introduce a penalty for using any number of features. The in the following code a objective i defined
# Perform classification and store performance in P
classifier.fit(X_subset, y)
P = (classifier.predict(X_subset) == y).mean()
# Compute for the objective function
j = (alpha * (1.0 - P)
+ (1.0 - alpha) * (1 - (X_subset.shape[1] / total_features)))
return j
What you could do, is add a penalty if the number of features is about max_num_features
, e.g.
features_count = np.count_nonzero(m)
features_overflow = np.clip( max_num_features - features_count, 0, 10)
feature_overflow_penalty = (features_overflow / 10)
and define a new objective with:
j = (alpha * (1.0 - P)
+ (1.0 - alpha) * (1 - (X_subset.shape[1] / total_features))) - feature_overflow_penalty
This is not tested, and there is work to do to find the right penalty. A alternative is to never suggest/try features above certain threshold.