Search code examples
hadoophiveapache-pig

install Hadoop,Pig and hive in laptop


I want to install hadoop, pig and hive in my laptop. I don't know how to install and configure hadoop,pig and hive and what software are required to do it.

Please let me know exact steps require to install/configure Hadoop, Pig and hive in laptop.

and i can use windows OS and i install the hadoop in windows OS


Solution

  • For beginners, I would recommend sticking to a good prepackaged Hadoop distribution/sandbox. Even if you want to learn how to setup up a Hadoop cluster before using the tools it provides (e.g. Hive etc.), setting up a common distribution is a lot easier at least in the beginning.

    Prepackaged sandboxes for Hadoop are going to be in Linux. But most likely, you will not need to do a lot in Linux to start using Hadoop if you start from these sandboxes. Personally, I think the time you will save by avoiding support and documentation issues on Windows ports will compensate greatly for any added effort required for jumping into Linux, and you will at least enter the domain of Linux which itself is a tremendously important tool.

    For prepackaged solutions, you may try to aim at Cloudera quickstart VM or MapR quickstart VM as these are the most widely used distributions. By using sandboxes, you will skip the installation process (which may be hectic if you don't know what you want and specially if you aren't familiar with Linux) and jump right into usage of tools. Due to availability of good documentation for large vendors such as Cloudera and MapR, you will also face lesser issues in accessing the tools you want to learn.

    Follow the vendor specific setup guidelines (also listed on the download pages as getting started guides) for further details on setting up the sandbox.

    Once you have the sandbox setup, you can use a lot of different ways to access Hive and Pig. You can use a command line interface for Hive (called beeline). If you are familiar with JDBC, you can access Hive through that. Install Apache-Thrift to enable much wider access options, but you can also save that for later.

    I would not recommend learning Pig unless you have very specific uses for it. If you are familiar with Java (or Scala, or even Python, among other options), try writing some Map-Reduce style jobs to learn more about how Hadoop works. Open Ambari (or Cloudera Manger etc.) interface which comes pre-configured with these sandboxes and see the tools and services that come pre-packaged with the sandbox. These are the most common ones and can be used as a useful list for starters. Start learning about them (but skip Pig if you can, even if it is pre-installed ;)

    Once you are familiar with the sandbox you have, I would suggest going for Apache Nifi which has easier learning curve and give a lot of flexibility. But you will most likely have to setup a new sandbox for that. It may also serve as a good revision exercise for learning. Integrate that with your Hadoop sandbox, implement some decent use cases and you will have some good experience to show.