Search code examples
hadoopapache-pigudf

Split characters inside Pig field


I have a text input with '|' separator as

0.0000|25000|                    |BM|BM901002500109999998|SZ

which I split using PigStorage

A = LOAD '/user/hue/data.txt' using PigStorage('|');

Now I need to split the field BM901002500109999998 into different fields based on their position , say 0-2 = BM - Field1 and like wise. So after this step I should get BM, 90100, 2500, 10, 9999998. Is there any way in Pig script to achieve this, otherwise I plan to write an UDF and put separator on required positions.

Thanks.


Solution

  • You are looking for SUBSTRING:

    A = LOAD '/user/hue/data.txt' using PigStorage('|');
    B = FOREACH A GENERATE SUBSTRING($4,0,2) AS FIELD_1, SUBSTRING($4,2,7) AS FIELD_2, SUBSTRING($4,7,11) AS FIELD_3, SUBSTRING($4,11,13) AS FIELD_4, SUBSTRING($4,13,20) AS FIELD_5;
    

    The output would be:

    dump B;
    (BM,90100,2500,10,9999998)
    

    You can find more info about this function here.