Search code examples
apache-sparkhadoophiveapache-kafkaspark-streaming

Is there away to share/access the hdfs among developers?


Me new to bigdata and hive. Need to work with other developer a spark streaming app, where it involves reading from Kafka and place it on hive/hdfs. The other developer uses/points to the same location of hdfs, reads the hive files and do the further processing.

My developement env is Eclipse on my windows system. Other developer env is Eclipse on his machine.

As both are working on same files, is there anyway to share the hdfs path between us ?

Please share the details of how these kind of scenarios handled in spark development teams?

Advice best practices and etc.

Thanks a lot, Shyam


Solution

  • You need to setup multinode hadoop cluster and configured all developer System IP as datanodes so that they can share same HDFS.

    Main Conf file for Hadoop: core-site.xml,hdfs-site.xml,mapred-site.xml & yarn-site.xml

    Once it done, you can install Hive and Spark over HDFS.

    Please refer links for Setup: https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/ https://dzone.com/articles/setting-up-multi-node-hadoop-cluster-just-got-easy-2