I see multiple examples of loading and processing data with Apache Ignite. But how do I export data from the ignite cache after it’s been processed?
I'm looking forward to implement processing of some large CSV files on a cluster. Say it’s a simple transformation that preprocesses data in a specific column. After I’m finished w it, how do I get it off the cache to an S3 bucket or some other location. My data will be partitioned across the nodes for speed of loading and loaded as a KV cache.
Is there a standard mechanism to export data from a cache (CSV in / CSV out) ? I've found that ML models can leverage the Exporter APIs. But that's not my use case.
Are scan queries a standard way to achieve what I want?
If you want to export the entire data set, then yes,
ScanQuery in combination with AffinityRun for every partition is probably the most efficient way to iterate over all cache entries and export them.
With affinityRun
we ask every node to export its part of data, instead of pulling the data to a single node for export.