Search code examples
hadoopmapreduceclouderahadoop-yarn

Is CDH4 meant mainly for YARN?


I have several questions or rather confusions regarding CDH4. I am posting here since I did not get any concrete information regarding my questions.

Is CDH4 meant to promote YARN? I tried setting up MapReduce1 using CDH4.3.0 using tarball. I finally did but it is round about and painful. Whereas YARN set up is strait forward.

Is anyone using YARN in production at all? Apache clearly says that YARN is still in alpha version and not meant for production. In such cases why is Cloudera making CDH4 YARN centric? Does Cloudera support YARN in production?

Apologies if the questions are inappropriate.

This is how the tarball extract looks like.

CDH4.3.0 tarball extracts

I followed couple of links to do a configuration but I am not happy the way it had to be done

CDH 4.3.0 tarball and MR1


Solution

  • No, CDH4 is not meant mainly for YARN. CDH5, on the other hand, will be.

    I'm not sure how you went about setting up your CDH cluster, but it's rather easy to add the MapReducev1 service, as opposed to YARN, using Cloudera Manager.

    Very few companies use YARN in production, Yahoo being the most notable.

    CDH4 is not YARN-centric. Cloudera includes YARN so people can have the most recent Hadoop bits accessible to them - but it's very clear on Cloudera's website that they do not recommend YARN for production.

    One of the big things that CDH4 brought to the table last year was HDFSv2, and they made MRv1 compatible with it.

    To install CDH4 with MRv1, see here.