Search code examples
hadoopmapreduceapache-pigclouderabigdata

Pig 0.12.0 - extracting last two characters from a string


I am using CDH 5.5, Pig 0.12.0. I have a chararray like this: 25 - 45 and I want to extract 25 and 45 out of this String.

So, I did this:

minValue = (int)SUBSTRING(value,0,2);
maxValue = ((int)SUBSTRING(value,6,2);

I am able to extract minValue but unable to extract the maxValue i.e. last two characters of the given String.

Even I tried but this one is also not working.:

maxValue = ((int)SUBSTRING(value,-2,2);

Please let me know how to make this work.


Solution

  • If delimeter is colon ( - ) always, then we can split and flatten the chararray to extract min and max value.

    A = LOAD 'input.csv' USING PigStorage(',') AS (min_max:chararray);
    B = FOREACH A GENERATE FLATTEN(STRSPLIT(min_max,' - ',0)) AS (min_val:chararray, max_val:chararray);
    DUMP B;
    

    Input :

    25 - 45
    35 - 65
    45 - 85
    

    Output :

    (25,45)
    (35,65)
    (45,85)