I'm learning the concepts of spring cloud dataflow and wondering what is the common way of storing global resources.
For example, when I have a stream with a pmml-processor and I would like to retrain the underlying pmml-model periodically via a spring-cloud-task.
Where would I store the model, so that it can be used as a (read-only)-resource by the processor and updated by the task every night? Is there a concept of a global storage in spring cloud dataflow? Should I just use a traditional database outside of the spring-cloud or is there a better way?
There is no general concept of shared storage within Spring Cloud Data Flow itself, but the Spring Resource
used to provide the model for the PMML processor is pretty flexible (see http://docs.spring.io/spring/docs/current/spring-framework-reference/html/resources.html and in particular Table 8.1 for a few path options that can be used for the pmml.model-location
parameter). So there are a couple options out of the box:
file://
protocol);Additional options (which require including additional jars in the application) are available for S3 (via https://cloud.spring.io/spring-cloud-aws/) and HDFS (via Spring for Apache Hadoop - see http://docs.spring.io/spring-hadoop/docs/current/reference/htmlsingle/#using-hdfs-resource-loader).