Search code examples
apache-sparkhadoophdfs

Apache-spark cluster without HDFS?


I'm trying to understand if it is possible to run Spark on a cluster without Hadoop services. It seems like it should be, given pages like standlone spark and spark on mesos, but even that doesn't offer an alternative to HDFS. Is there one? The goal is finding a way to deploy a Spark cluster without signing up to manage a Hadoop cluster, too.


Solution

  • NFS, S3 (Minio), GCS, Azure WASB, Databricks DBFS, Ceph all work with Spark.

    Spark on Mesos is deprecated according to the note on its page, in favor of Kubernetes, which is even more complicated to manage than Hadoop.