Search code examples
hadoopapache-pig

Extract data into different relation in PIG


I have the below single raw file, and need to split the file into different relations.

If line starts with 0, the complete line should goto relation 'header'

If line starts with 1, the complete line should goto relation 'ban'

If line starts with 2, the complete line should goto relation 'sub'

If line starts with 3, the complete line should goto relation 'item'

If line starts with 4, the complete line should goto relation 'tax'

0ALH   012012050104.00.00356.0012.06001

1980377362   HAW R 120010000IRN+000016323SABRINA D. ORTIZ                                            PO BOX 1764                                                                                                                                                                                             KAILUA KONA               HI967451764September 2009      03.4June 2008           06.0E   00

2980377362   8089363822    HAW  120010000SABRINA D. ORTIZ                                            75-1027 HENRY ST                                                                                                                                                                                        KAILUA KONA               HI967403154September 2009      03.4June 2008           06.0EN00

2980377362   8089375559    HAW  120010000SABRINA D. ORTIZ                                            75-1027 HENRY ST                                                                                                                                                                                        KAILUA KONA               HI967403154September 2009      03.4June 2008           06.0EN00

3980377362   8089363822             911FEEO      O           SNOTAX1001+000000066201205029-1-1 Service Fee                                                                                                                     0000004950533060000002163C

3980377362   8089363822    GSMUSELASCPKG  R      R           S          00000000020120502Custom Call Package                                                                                                                   000000495053163           

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    105      +04160000+000000125 0000000000000000495053186

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    131      +00084600+000000003 0000000000000000495053186

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    133      +04146600+000000124 0000000000000000495053186

Please can you help me with a pig script to do this ?


Solution

  • Load the data into a single field.Foreach line get the first character of the line and compare it with the values you are looking for and use split to store it into different relations.

    A = LOAD '/path/file.txt' USING TextLoader() as (line:chararray);
    SPLIT A INTO header IF SUBSTRING(A.line,0,1) == '0',
                 ban IF    SUBSTRING(A.line,0,1) == '1',
                 sub IF    SUBSTRING(A.line,0,1) == '2',
                 item IF   SUBSTRING(A.line,0,1) == '3',
                 tax IF    SUBSTRING(A.line,0,1) == '4';
    DUMP header;
    DUMP ban;
    DUMP sub;
    DUMP item;
    DUMP tax;