Search code examples
postgresqlapache-sparkapache-spark-sqlignite

How to cache a Posgres table as a 3rd Party Persistence on demand in Apache Ignite?


I've been reading the documentation about 3rd Party Store of Apache Ignite: https://apacheignite.readme.io/v2.7/docs/3rd-party-store#section-manual

But I still have a few doubts:

  1. Can a Postgresql table be cached into an already started Ignite instance? Would it be possible using a Ignite client?
  2. If a new row is inserted into the Postgresql table, will Ignite cache be refreshed automatically?
  3. Once the Postgresql table is cached in Ignite, could this cached data be read using Spark Dataframes with the Ignite datasource?

Solution

    1. Suppose you have a running Ignite server node. You can start a new cache with cacheStoreFactory specified, and run IgniteCache#loadCache(...) on it. It can be done from a client node, but all nodes should have the factory class on their classpath.

    2. Insertion of new data from the underlying database doesn't happen automatically. But if you enable the read-through mode, then data, that doesn't exist in Ignite cache will be requested from the database on demand.

    3. In order for Spark DataFrames to read data, corresponding SQL tables should be created in Ignite. QueryEntities or indexed types should be configured in Ignite for the cached data. See https://apacheignite-sql.readme.io/docs/schema-and-indexes and https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-reading-dataframes