Search code examples
javahadoopcascading

Getting cascading.tap.hadoop.io.MultiInputSplit class not found exception while running hadoop program using cascading framework


Here is my code that connects to hadoop machine and perform set of validation and write on another directory.

      public class Main{

            public static void main(String...strings){

        System.setProperty("HADOOP_USER_NAME", "root");
        String in1 = "hdfs://myserver/user/root/adnan/inputfile.txt";
        String out = "hdfs://myserver/user/root/cascading/temp2";

        Properties properties = new Properties();
        AppProps.setApplicationJarClass(properties, Main.class);
        HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

        Tap inTap = new Hfs(new TextDelimited(true, ","), in1);
        Tap outTap = new Hfs(new TextDelimited(true, ","), out);

        Pipe inPipe = new Pipe("in1");  

        Each removeErrors = new Each(inPipe, Fields.ALL, new BigFilter());
        GroupBy group = new GroupBy(removeErrors, getGroupByFields(fieldCols));
        Every mergeGroup = new Every(group, Fields.ALL, new MergeGroupAggregator(fieldCols), Fields.RESULTS);

        FlowDef flowDef = FlowDef.flowDef()
                .addSource(inPipe, inTap)
                .addTailSink(mergeGroup, outTap);

        flowConnector.connect(flowDef).complete();

}

My job is getting submitted to hadoop machine. I can check this on job tracker. but job is getting failed and I am getting exception below.

cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346) ... 7 more

java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)

Note that : 1. I am running this from my windows machine and hadoop is setup on different box. 2. I am using cloudera distribution for hadoop which is CDH 4.


Solution

  • got the issue. CDH 4.2 has issue with cascading 2.1. So changed to CDH 4.1 and it worked for me.