sample data : (tsv file: sampl)
1 a
2 b
3 c
raw= load 'sampl' using PigStorage() as (f1:chararray,f2:chararray);
grouped = group raw by f1;
describe grouped;
fields = foreach grouped {
x = sample raw 1;
generate x;
}
When I run this I am getting error at the line x = sample raw 1;
ERROR 1200: mismatched input 'raw' expecting LEFT_PAREN
Is sampling not allowed for a grouped record?
You can't use 'sample' command inside nested block.This is not supported in pig.
Only few operations operations like (CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY) are allowed in nested block. You have to use the sample command outside of the nested block.
The other problem is, you are loading your input data using default delimiter ie tab. But your input data is delimited with space, so you need to change your script like this
raw= load 'sampl' using PigStorage(' ') as (f1:chararray,f2:chararray);