Search code examples
javadataframeapache-sparktransformapache-spark-mllib

Best way to Create a custom Transformer In Java spark ml


I am learning Big data using Apache spark and I want to create a custom transformer for Spark ml so that I can execute some aggregate functions or can perform other possible operation on it


Solution

  • You need to extends org.apache.spark.ml.Transformer class, this is an abstract class so you have to provide implementation of abstract methods.
    As I have seen that in most of the cases we needs to provide implementation of transform(Dataset<?> dataset) method and implementation of String uid() .
    Example:

    public class CustomTransformer extends Transformer{
    
     private final String uid_;
    
     public CustomTransformer(){
      this(Identifiable.randomUID("Custom Transformer"));
     }
    
     @Override
     public String uid(){
      return uid_;
     }
    
    @Override
    public Transformer copy(ParamMap extra){
      return defaultCopy(extra);
    }
    
    @Override
    public Dataset<Row> transform(Dataset<?> dataset){
     // do your work and return Dataset object
    }
    
    @Override
    public StructType transformSchema(StructType schema){
      return schema;
    }
    

    }
    I am also new in this so I suggest you should learn what are the uses of these abstract methods.