Whats the NeurAxle way to select a subset of columns from a dataset? This is how i am doing it via sklearn:
class ColumnSelectTransformer(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X):
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)
return X[self.columns]
# Set up SIMPLE FEATURES
simple_cols = ['BEDCERT', 'RESTOT', 'INHOSP', 'CCRC_FACIL',
'SFF', 'CHOW_LAST_12MOS', 'SPRINKLER_STATUS',
'EXP_TOTAL', 'ADJ_TOTAL']
simple_features = Pipeline([
('cst', ColumnSelectTransformer(simple_cols)),
('impute', SimpleImputer())
])
EDIT:-
I think this is one solution but im not 100% convinced.
class ColumnSelectTransformer(BaseTransformer, ForceHandleMixin):
def __init__(self, required_columns):
BaseTransformer.__init__(self)
ForceHandleMixin.__init__(self)
self.required_columns = required_columns
def inverse_transform(self, processed_outputs):
pass
def fit(self, X, y=None):
return self
def transform(self, X):
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)
return X[self.required_columns]
Update: this was fixed. See usage example of the column transformer here: https://www.neuraxle.org/stable/examples/sklearn/plot_cyclical_feature_engineering.html#sphx-glr-examples-sklearn-plot-cyclical-feature-engineering-py
There is already an issue for this: https://github.com/Neuraxio/Neuraxle/issues/168
I would be tempted to not use Pandas for now, and instead use the provided ColumnTransformer: https://www.neuraxle.org/stable/api/neuraxle.steps.column_transformer.html
If you get to fully code (and properly unit test) your Pandas Transformer, we'd be glad to have your contribution by opening a pull request on Neuraxle and adding you as a contributor.
Until then, you could code a simple PandasToNumpy
step that would return the .values
in a call to transform
, and then using the existing ColumnTransformer
of Neuraxle by providing the integers of the desired columns instead of the strings.
Also note that you can inherit from the NonFittableMixin
to override the fit
as a return self without additional code.