Search code examples
hadoopapache-pig

Extract data in Pig excluding first column


I have unstructured data

key1|a1|a11|a21|a31|a41
key2|b1|b11
key3|c1|c11|c21
key4|d1
key2|b101|b111
key1|a101|a111|a121|a131|a141

Based on the first column, the records are split and distributed to directories.

z = load '/user/input/data.txt' using PigStorage('|');
split z into z1 if $0 == 'key1', z2 if $0 == 'key2', z3 if $0 == 'key3', z4 if $0 == 'key4';
z11 = foreach z1 generate $1,$2,$3,$4,$5;
z22 = foreach z2 generate $1,$2;
z33 = foreach z3 generate $1,$2,$3;
z44 = foreach z4 generate $1;

For the above input : key1|a1|a11|a21|a31|a41

I need the output as "a1|a11|a21|a31|a41" except "key1".

I can get the values by specifying positions

z11 = foreach z1 generate $1,$2,$3,$4,$5;

Is there a way, where I can extract the above data with out specifying positions?


Solution

  • If you dont know exacltly how many field you have, you can use this synthax :

    z11 = foreach z1 generate $1..;
    z22 = foreach z2 generate $1..;
    z33 = foreach z3 generate $1..;
    z44 = foreach z4 generate $1..;
    

    So you exclude the 1st field $0 and keep the rest starting from the 2nd field $1 without specifying all of them explecitely