Search code examples
hadoopapache-sparkdockerhadoop-yarnbigdata

Using Hadoop and Spark on Docker containers


I want to use Big Data Analytics for my work. I have already implemented all the docker stuff creating containers within containers. I am new to Big Data however and I have come to know that using Hadoop for HDFS and using Spark instead of MapReduce on Hadoop itself is the best way for websites and applications when speed matters (is it?). Will this work on my Docker containers? It'd be very helpful if someone could direct me somewhere to learn more.


Solution

  • You can try playing with Cloudera QuickStart Docker Image to get started. Please take a look at https://hub.docker.com/r/cloudera/quickstart/. This docker image supports single-node deployment of Cloudera's Hadoop platform, and Cloudera Manager. Also this docker image supports spark too.