I have the following data:
address|some_mask_value
123 Main | 10100011110
124 Main | 10100011100
I am using Apache Pig version 0.15.0.2.4.2.0-258
I'm trying to create an indicator where the 2nd to last character in 'some_mask_value' is a 1. I've tried:
load_data = LOAD '/myfile.txt' USING PigStorage('|') AS (address:String, some_mask_value:String);
grunt> case_test = FOREACH load_data GENERATE (CASE trial
>> WHEN LAST_INDEX_OF(name, '1') 2 THEN yes
>> ELSE no);
2017-04-20 16:59:50,522 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 5, column 30> mismatched input '2' expecting THEN
Basically, if the 2nd to last character is 1, then I'll filter out that row later
a = load 'data.txt' using PigStorage('|')
as (address: chararray, some_mask_value:chararray);
If mask field is fixed length, like in you sample data, then:
b = foreach a generate $0 .. , (
CASE SUBSTRING(some_mask_value, 9, 10)
WHEN '1' THEN 'YES'
ELSE 'NO'
END
) as inidcator;
dump b;
(123 Main,10100011110,YES)
(124 Main,10100011100,NO)
if mask is not fixed length:
b = foreach a generate $0 .. , (
CASE SUBSTRING(some_mask_value, (int)SIZE(some_mask_value) - 2, (int)SIZE(some_mask_value) - 1)
WHEN '1' THEN 'YES'
ELSE 'NO'
END
) as indicator;
dump b;
(123 Main,10100011110,YES)
(124 Main,10100011100,NO)
This assumes mask field dost not have leading or trailing spaces.