Can I use hadoop in Jupyter/IPython

Can I use Hadoop & MapReduce in Jupyter/IPython? Is there something similar to what PySpark for Spark is?

Solution

Of course you can. Many Frameworks like Hadoop Streaming, mrjob and dumbo to name a few. The techical aspect of including these in Jupyter should concist of either subprocess.Popen() calls or typical python imports, depending on the framework.

A nice overview/critique of some of these Frameworks can be found in this cloudera blogpost.

Sqoop Import HBase - SQL Database
Spark Streaming - Refresh Static Data
How to copy and convert parquet files to csv
How to read Parquet file from S3 without spark? Java
Spark - load CSV file as DataFrame?
Apache Spark: how to cancel job in code and kill running tasks?
BDB0091 DB_VERSION_MISMATCH: Database environment version mismatch with Ambari 2.4.2
"The machine with the name 'c6401' was not found configured for this Vagrant environment." Error
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Where does Big Data go and how is it stored?
Hadoop: How can i resolve the error "Could'n upload the file" in docker container
What exactly does Data Locality mean in Hadoop?
What runs first: the partitioner or the combiner?
Unable to start a node manager on master
NoClassDefFoundError: org/apache/parquet/conf/ParquetConfiguration
What should be hadoop.tmp.dir ?
convert TO_CHAR, IS_DATE to hive query
ERROR : spark-shell \Spark\bin\..' was unexpected at this time
Docker - Hive with Postgres errors
Docker Hive - /entrypoint.sh: line 4: pg_isready: command not found
incompatible cluster id between namenode and datanode for hadoop
hadoop/hdfs/name is in an inconsistent state: storage directory(hadoop/hdfs/data/) does not exist or is not accessible
Need to move small JSON messages from Kafka to HDFS with Kafka Connect but without using Confluent libs, if not completely free
How to copy file from HDFS to the local file system
Uber mode configuration settings aligned but jobs do not execute in uber mode
Read shapefile from HDFS with geopandas
Hadoop on Windows - "Error JAVA_HOME is incorrectly set."
Hadoop 2.6 Mapreduce permissions incorrectly set on Windows
Ports are not available: listen tcp 0.0.0.0/50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions
Alter hive table add or drop column