Search code examples
python-3.xgeneticgplearn

Y_train values for symbolicRegressor


I split my dataset in X_train, Y_train, X_test and Y_test, and then I used the symbolicRegressor...

I've already convert the string values from Dataframe in float values. But by applying the symbolicRegressor I get this error:

ValueError: could not convert string to float: 'd'

Where 'd' is a value from Y.

Since all my values in Y_train and Y_test are alphabetic character because they are the "labels", I can not understand why the symbolicRegressor tries to get a float number ..

Any idea?


Solution

  • According to the https://gplearn.readthedocs.io/en/stable/index.html - "Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship". Pay attention to mathematical. I am not good at the topic of the question and gplearn's description does not clearly define area of applicability / restrictions.

    However, according to the source code https://gplearn.readthedocs.io/en/stable/_modules/gplearn/genetic.html method fit() of BaseSymbolic class contains line X, y = check_X_y(X, y, y_numeric=True) where check_X_y() is sklearn.utils.validation.check_X_y(). Argument y_numeris means: "Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms".

    So y values must be numeric.