Search code examples
ubuntuhadoopapache-pig

Pig Latin unable to dump


I am unable to use the dump funtion for the alias 'TMP'. I had no issue when loading and dumping the files. I have no issue using the describe 'function' for both tables as well as the alias 'TMP'. I've also manually set the path to be at the folder so there shouldn't be any issues. The files are also at the HDFS server as well.

orderdetails = load 'order_details.tbl' using PigStorage('|') as
(ORDER_ID:int,PRODUCT_ID:int,CUSTOMER_ID:int,SALESPERSON_ID:int,UNIT_PRICE:float,QUANTITY:int,DISCOUNT:float                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     );
dump orderdetails;
describe orderdetails;

salesperson = load 'salesperson.tbl' using PigStorage('|') as
(EMPLOYEE_ID:int,LASTNAME:chararray,FIRSTNAME:chararray,TITLE:chararray,BIRTHDATE:chararray,HIREDATE:chararray,NOTES:chararray                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        );
dump salesperson;
describe salesperson;

TMP = join salesperson by EMPLOYEE_ID, orderdetails by SALESPERSON_ID;
dump TMP;
describe TMP;

Error:

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1622041116748_0009  TMP,orderdetails,salesperson    HASH_JOIN   Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:8020/user/bigdata/A3/order_details.tbl
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:748)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:8020/user/bigdata/A3/order_details.tbl
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
    ... 18 more
    hdfs://localhost:8020/tmp/temp441281403/tmp921242953,

Input(s):
Failed to read data from "hdfs://localhost:8020/user/bigdata/A3/salesperson.tbl"
Failed to read data from "hdfs://localhost:8020/user/bigdata/A3/order_details.tbl"

Output(s):
Failed to produce result in "hdfs://localhost:8020/tmp/temp441281403/tmp921242953"

Solution

  • As per error, the path provided does not exist. Please provide the correct path.

    Relation_name = LOAD 'Input file path' USING function as schema;
    

    It might look like

    student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray,lastname:chararray, phone:chararray, city:chararray );