I am learning Big data using Apache spark and I want to create a custom transformer for Spark ml so that I can execute some aggregate functions or can perform other possible operation on it
You need to extends org.apache.spark.ml.Transformer class, this is an abstract class so you have to provide implementation of abstract methods.
As I have seen that in most of the cases we needs to provide implementation of transform(Dataset<?> dataset) method and implementation of String uid() .
Example:
public class CustomTransformer extends Transformer{
private final String uid_;
public CustomTransformer(){
this(Identifiable.randomUID("Custom Transformer"));
}
@Override
public String uid(){
return uid_;
}
@Override
public Transformer copy(ParamMap extra){
return defaultCopy(extra);
}
@Override
public Dataset<Row> transform(Dataset<?> dataset){
// do your work and return Dataset object
}
@Override
public StructType transformSchema(StructType schema){
return schema;
}
}
I am also new in this so I suggest you should learn what are the uses of these abstract methods.