Search code examples
regexapache-pighl7

How to get the last element in Pig Script


I want to get the last element of a line using pig script. I cant use $ as the index of last element is not fixed. I tried using Regular Expression but it is not working. I tried using $-1 to get it but it didn't work. I am posting only a sample as my actual file contains more of PID's.

Sample:

MSH|�~\&|LAB|LAB|HEATH|HEA-HEAL|20247||OU�R01|M1738000000001|P|2.3|||ER|ER|
PID|1|YXQ120185751001|YXQ120185751001||ELJKDP@#PDUB||19790615|F||| H LGGH VW��ZHVW FKHVWHU�SD�19380|||||||4002C340778A|000009561|ELJKDP@#PDUB19790615F

i want ot get the last value of PID i;e ELJKDP@#PDUB19790615F for that i have tried below code's but it is not working.

Code 1:

STOCK_A = LOAD '/user/rt/PARSED' USING PigStorage('|'); 
data = FILTER STOCK_A BY ($0 matches '.*PID.*'); 
MSH_DATA = FOREACH data GENERATE $2 AS id, $5 AS ame , $7 AS dob, $8 AS gender, $-1 AS rk;

Code 2:

STOCK_A = LOAD '/user/rt/PARSED' USING PigStorage('|'); 
data = FILTER STOCK_A BY ($0 matches '.*PID.*'); 
MSH_DATA = FOREACH data GENERATE $2 AS id, $5 AS ame , $7 AS dob, $8 AS gender, REGEX_EXTRACT(data,'\\s*(\\w+)$',1) AS rk;

Error for Code 2:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: Invalid scalar projection: data : A column needs to be projected from a relation for it to be used as a scalar

Please help


Solution

  • This should work

    REGEX_EXTRACT(data,'([^|]+$)',1) AS rk
    

    [^|]+$ matches everything to the right of the last pipe character.

    Output