EqualsIgnoreCase function - Exception : org.apache.pig.backend.executionengine.ExecException
Input :
a.csv
-------
a
A
(blank/empty line)
b
B
c
C
Objective : To select the records which are 'a', 'A', 'b' and 'B'.
Approach 1 :
A = LOAD 'a.csv' using PigStorage(',') AS (value:chararray);
B = FILTER A BY LOWER(value) IN ('a','b');
DUMP B;
Output :
(a)
(A)
(b)
(B)
Approach 2 :
C = FILTER A BY EqualsIgnoreCase(value, 'a') or EqualsIgnoreCase(value, 'b');
Output :
2015-04-27 23:48:21,958 [Thread-30] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0014
org.apache.pig.backend.executionengine.ExecException
at org.apache.pig.builtin.EqualsIgnoreCase.exec(EqualsIgnoreCase.java:50)
Trying to understand why this exception is getting thrown. I understand that its because of the blank record.
Tried checking for value NOT being null or empty, still the same error.
D = FILTER A BY (value IS NOT NULL) OR (TRIM(value) != '') AND (EqualsIgnoreCase(value, 'a') or EqualsIgnoreCase(value, 'b'));
Any inputs/ thoughts on achieving our objective using Approach 2 is much appreciated.
Yes you are right, string functions EqualsIgnoreCase
and TRIM
are not able to handle blank string in the input.
To solve this issue,what ever you did in the last stmt is right, just remove the Trim
function it will work.
C = FILTER A BY (value is not null) and (EqualsIgnoreCase(value, 'a') or EqualsIgnoreCase(value, 'b'));
Is not null condition
will take care of empty(null, space and tab) chars, so TRIM
function is not required.