Search code examples
apache-pig

Not able to split chararray field containing spaces and tabs between the words. Help me with the command using Apache Pig?


Sample.txt File

2017-01-01 10:21:59 THURSDAY    -39 3 Pick up a bus - Travel for two hours
2017-02-01 12:45:19 FRIDAY  -55 8 Pick up a train - Travel for one hour
2017-03-01 11:35:49 SUNDAY  -55 8 Pick up a train - Travel for one hour
I
.
. 

When I executed the suggested command, it got split into three fields.

when I do the below operation, it is not working as expected.

A = LOAD 'Sample.txt' USING PigStorage() as (line:chararray);
B = foreach A generate STRSPLIT(line, ' ', 3);
c = foreach B generate $2;
split C into buslog if $0 matches '.*bus*.', trainlog if $0 matches '.*train*.';

Note:- Dump of C will give below result.

THURSDAY    -39 3 Pick up a bus - Travel for two hours
FRIDAY  -55 8 Pick up a train - Travel for one hour
SUNDAY  -55 8 Pick up a train - Travel for one hour

Requirement: In the above result, i want to split train and bus into two relations, but it is not happening as expected


Solution

  • The syntax is .*string.*.Notice that it is .* on both sides of the string.

    split C into buslog if $0 matches '.*bus.*', trainlog if $0 matches '.*train.*';