Search code examples
hadoopmapreducefile-read

ArrayIndexOutOfBoundsException in MapReduce


I am getting array index out of bound error in MAP part. My code is as below. I am trying to read the input file from the HDFS. Is there any better way to read the HDFS file?

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
        {
                private Text key12 = new Text();
                private Text value = new Text();

                public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
                {
                        String line=value.toString();
                        while((line = value.toString()) != null)
                        {
                                        //StringTokenizer tokenizer = new StringTokenizer(line);
                                        //String field = tokenizer.nextToken();
                                        //
                                        String[] parts= line.split(" ");

                                        if(parts[0].contains("STN") == false)
                                        {
                                                String field=parts[0];
                                                String month=parts[3];
                                                String temp;
                                                if(parts[7].trim().equals(""))
                                                {
                                                        temp=parts[8];
                                                }
                                                else
                                                        temp=parts[7];
                                                //tokenizer.nextToken();
                                                //String month = tokenizer.nextToken();

                                                month=month.substring(4,6);
                                                //String temp = tokenizer.nextToken();

                                                String val = month+temp;

                                                key12.set(field);
                                                value.set(val);
                                                output.collect(key12, value);
                                        }
                        }
        }

Solution

  • There are an awful lot of places where this could go wrong, irrespective of where this particular error is. What if parts doesn't have 9 elements? What if it does have 9 elements but some of them are null? What if line doesn't have a space character in it? What if month only has three characters in it?

    Handle all of these situations and your issue will be resolved.

    As an aside, use

     if(!parts[0].contains("STN"))
    

    instead of

     if(parts[0].contains("STN") == false)
    

    and consider extracting some of your Strings (such as "STN" and " " into private static final String variables. This will greatly improve your performance.