Search code examples
apache-pigcase

Find when a specific char is 2nd to last in a string in pig


I have the following data:

address|some_mask_value
123 Main | 10100011110
124 Main | 10100011100

I am using Apache Pig version 0.15.0.2.4.2.0-258

I'm trying to create an indicator where the 2nd to last character in 'some_mask_value' is a 1. I've tried:

load_data = LOAD '/myfile.txt' USING PigStorage('|') AS (address:String, some_mask_value:String);

grunt> case_test = FOREACH load_data GENERATE (CASE trial
>> WHEN LAST_INDEX_OF(name, '1') 2 THEN yes
>> ELSE no);

2017-04-20 16:59:50,522 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 5, column 30>  mismatched input '2' expecting THEN

Basically, if the 2nd to last character is 1, then I'll filter out that row later


Solution

  • a = load 'data.txt' using PigStorage('|') 
           as (address: chararray, some_mask_value:chararray);
    

    If mask field is fixed length, like in you sample data, then:

    b = foreach a generate $0 .. , (
            CASE SUBSTRING(some_mask_value, 9, 10)
                WHEN '1' THEN 'YES'
                ELSE 'NO'
            END
        ) as inidcator;
    
    dump b;
    (123 Main,10100011110,YES)
    (124 Main,10100011100,NO)
    

    if mask is not fixed length:

    b = foreach a generate $0 .. , (
            CASE SUBSTRING(some_mask_value, (int)SIZE(some_mask_value) - 2, (int)SIZE(some_mask_value) - 1)
                WHEN '1' THEN 'YES'
                ELSE 'NO'
            END
        ) as indicator;
    dump b;
    (123 Main,10100011110,YES)
    (124 Main,10100011100,NO)
    

    This assumes mask field dost not have leading or trailing spaces.