I came across two scripts which does the same job of calculating the percentage of the values in pig.
Script1
total = FOREACH (GROUP A ALL) GENERATE COUNT(A);
rows = FOREACH (GROUP A by $0) GENERATE group as colname, COUNT(A) as cnt;
fractions = FOREACH rows GENERATE colname, cnt/(double)total.$0;
Script2
test = LOAD 'test.txt' USING PigStorage(',') AS (one:chararray,two:int);
B = GROUP test by $0;
C = FOREACH B GENERATE group, COUNT(test.$0);
D = GROUP test ALL;
E = FOREACH D GENERATE group,COUNT(test.$0);
F = CROSS C,E;
G = FOREACH F GENERATE $0,(double)($1*100/$3);
From the outset the Script1 is efficient that Script2.
I want to know if there are tools like VisualVM ,JProfiler in Java, to measure the performance of the pig scripts.
The time take to run the script is one way to do measure, but are there tools build for it?
Using Explain command figure out the MR plan for both scripts. Compare the plan based on some general rules (There can be variations)