Search code examples
hadoophdfsignite

Ignite Community Edition: Configuring IGFS with HDFS as persistent storage


i would like to have more details about the configuration of apache ignite (cluster igfs) and hdfs. I can't find any official reference, so i doubt that is possibile to do so with the opensource version of apache ignite and i need to switch to something like gridgain. Is that true? I would like to use apache ignite to perform in-memory computation with spark and i would like to have a "kind of automatic" sync with hadoop hdfs as backend storage, because i don't want to perform any manual load from hdfs.

thanks


Solution

  • You can still use Apache Ignite's integration with Spark to work with HDFS:

    https://ignite.apache.org/docs/latest/extensions-and-integrations/ignite-for-spark/overview#supported-spark-version

    There are currently integrations for Spark 2.3, 2.4 and 3.0. The latter was added not so long ago, for some reason it is not in the documentation. But it's here:

    https://downloads.apache.org/ignite/ignite-extensions/ignite-spark-ext/3.0.0/

    Anyway, you can also check my webinar about this integration:

    https://www.youtube.com/watch?v=lkRh2TO8VSU

    Also you can see the examples here:

    https://github.com/GridGain-Demos/spark-hdfs-ignite-aws-deployment-demo/blob/master/spark_example_project/src/main/java/test/SparkIgniteLoaderFromHdfs.java