Search code examples
hadoophdfsapache-pig

Pig Error trying Edureka's Tutorial


I've been trying to run hadoop and other components like PIG.

I'm trying this tutorial: https://www.edureka.co/blog/pig-programming-create-your-first-apache-pig-script/

Everything is right, but when I run the script at step 2 it throws this error:

2018-01-09 13:47:20,682 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:output.pig got an error while submitting 
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/carlos/information.txt
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:748)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)

Solution

  • Prior to step 1, you needed this command

    hadoop fs –copyFromLocal /home/carlos/information.txt /carlos
    

    However, this will copy your file to a file named /carlos on HDFS, if that directory doesn't already exist.

    If you want /carlos to be a directory, you need to delete the file and make it

    hadoop fs -rm /carlos
    hadoop fs -mkdir /carlos
    

    Also, trailing slashes should generally be used when copying files into a directory, like so

    hadoop fs –copyFromLocal /home/carlos/information.txt /carlos/
    

    You could also just make your Pig code load /carlos as the file. Even if it was a directory, this would still work to read all files in there