Search code examples
hadoopmapreduceipythonjupyter

Can I use hadoop in Jupyter/IPython


Can I use Hadoop & MapReduce in Jupyter/IPython? Is there something similar to what PySpark for Spark is?


Solution

  • Of course you can. Many Frameworks like Hadoop Streaming, mrjob and dumbo to name a few. The techical aspect of including these in Jupyter should concist of either subprocess.Popen() calls or typical python imports, depending on the framework.

    A nice overview/critique of some of these Frameworks can be found in this cloudera blogpost.