Search code examples
apache-sparkpysparkpipelineapache-spark-ml

PySpark - How to show what components are included in a Pipeline?


In the code below, a PySpark pipeline contains two tranformers. How to print out the names of these two transformers given the pipleline?

from pyspark.ml.feature import (StringIndexer, OneHotEncoder)
from pyspark.ml import Pipeline
gender_indexer = StringIndexer(inputCol = 'Sex', outputCol = 'SexIndex')
gender_encoder = OneHotEncoder(inputCol='SexIndex', outputCol = 'SexVec')

pipeline = Pipeline(stages = [gender_indexer, gender_encoder])

Solution

  • pipeline.getStages() will show you the stages in the pipeline:

    >>> pipeline.getStages()
    [StringIndexer_84633f93b8f6, OneHotEncoder_6a01b7a7cdc1]
    

    Note that each list element is an object, not a string.