Python Pipeline Custom Transformer

I am trying to code a custom transformer to be used in a pipeline to pre-process data.

Here is the code I'm using (sourced - not written by me). It takes in a dataframe, scales the features, and returns a dataframe:

class DFStandardScaler(BaseEstimator,TransformerMixin):

    def __init__(self):

        self.ss = None

    def fit(self,X,y=None):

        self.ss = StandardScaler().fit(X)
        return self

    def transform(self, X):

        Xss = self.ss.transform(X)
        Xscaled = pd.DataFrame(Xss, index=X.index, columns=X.columns)
        return Xscaled

I have data that has both categorical and continuous features. Obviously the transformer will not transform the categorical feature ('sex'). When I fit this pipeline with the dataframe below it throws an error because it is trying to scale the categorical labels in 'sex':

     sex  length  diameter  height  whole_weight  shucked_weight  \
0      M   0.455     0.365   0.095        0.5140          0.2245   
1      M   0.350     0.265   0.090        0.2255          0.0995   
2      F   0.530     0.420   0.135        0.6770          0.2565   
3      M   0.440     0.365   0.125        0.5160          0.2155   
4      I   0.330     0.255   0.080        0.2050          0.0895   
5      I   0.425     0.300   0.095        0.3515          0.1410

How do I pass a list of categorical / continuous features into the transformer so it will scale the proper features? Or is it better to somehow code the feature type check inside the transformer?

Solution

Basically you need another step in the Pipeline with a similar class inheriting from BaseEstimator and TransformerMixin

class ColumnSelector(BaseEstimator,TransformerMixin):
    def __init__(self, columns: list):
        self.cols = columns

    def fit(self,X,y=None):
        return self

    def transform(self, X, y=None):
        return X.loc[:, self.cols]

Then in your main the pipeline looks like this:

selector = ColumnSelector(['length', 'diameter', 'height', 'whole_weight', 'shucked_weight'])
pipe = pipeline.make_pipeline(
    selector,
    DFStandardScaler()
)

pipe2 = pipeline.make_pipeline(#some steps for the sex column)

full_pipeline = pipeline.make_pipeline(
    pipeline.make_union(
        pipe,
        pipe2
    ),
    #some other step
)