I am getting array index out of bound error in MAP part. My code is as below. I am trying to read the input file from the HDFS. Is there any better way to read the HDFS file?
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
{
private Text key12 = new Text();
private Text value = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
String line=value.toString();
while((line = value.toString()) != null)
{
//StringTokenizer tokenizer = new StringTokenizer(line);
//String field = tokenizer.nextToken();
//
String[] parts= line.split(" ");
if(parts[0].contains("STN") == false)
{
String field=parts[0];
String month=parts[3];
String temp;
if(parts[7].trim().equals(""))
{
temp=parts[8];
}
else
temp=parts[7];
//tokenizer.nextToken();
//String month = tokenizer.nextToken();
month=month.substring(4,6);
//String temp = tokenizer.nextToken();
String val = month+temp;
key12.set(field);
value.set(val);
output.collect(key12, value);
}
}
}
There are an awful lot of places where this could go wrong, irrespective of where this particular error is. What if parts
doesn't have 9 elements? What if it does have 9 elements but some of them are null? What if line
doesn't have a space character in it? What if month
only has three characters in it?
Handle all of these situations and your issue will be resolved.
As an aside, use
if(!parts[0].contains("STN"))
instead of
if(parts[0].contains("STN") == false)
and consider extracting some of your Strings (such as "STN"
and " "
into private static final String
variables. This will greatly improve your performance.