Search code examples
apache-sparkcassandraapache-spark-ml

Saving Spark ML pipeline to a database


Is it possible to save a Spark ML pipeline to a database (Cassandra for example)? From the documentation I can only see the save to path option:

myMLWritable.save(toPath);

Is there a way to somehow wrap or change the myMLWritable.write() MLWriter instance and redirect the output to the database?


Solution

  • It is not possible (or at least no supported) at this moment. ML writer is not extendable and depends on Parquet files and directory structure to represent models.

    Technically speaking you can extract individual components and use internal private API to recreate models from scratch, but it is likely the only option.