I've two CSV files:
1- Fertiltiy.csv :
2- Life Expectency.csv :
I want to join them in pig so that the result will be like this:
I am new to pig, I couldn't get the correct answer, but here is my code:
fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();
lifeExpectency = LOAD 'lifeExpectency' USING org.apache.hcatalog.pig.HCatLoader();
A = JOIN fertility by country, lifeExpectency by country;
B = JOIN fertility by year, lifeExpectency by year;
C = UNION A,B;
DUMP C;
Here is the result of my code:
You have the join by country and year and select the necessary columns needed for your final output.
fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();
lifeExpectency = LOAD 'lifeExpectency' USING org.apache.hcatalog.pig.HCatLoader();
A = JOIN fertility by (country,year), lifeExpectency by (country,year);
B = FOREACH A GENERATE fertility::country,fertility::year,fertility::fertility,lifeExpectency::lifeExpectency;
DUMP B;