Search code examples
mysqlhadoopsqoop

Import data from mysql into HDFS using Sqoop


I am using Hadoop-1.2.1 and Sqoop-1.4.6. I am using sqoop to import the table test from the database meshtree into HDFS using this command:

`sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test`

But, it shows this error:

17/06/17 18:15:21 WARN tool.BaseSqoopTool: Setting your password on the     command-line is insecure. Consider using -P instead.
17/06/17 18:15:21 INFO manager.MySQLManager: Preparing to use a MySQL     streaming resultset.
17/06/17 18:15:21 INFO tool.CodeGenTool: Beginning code generation
17/06/17 18:15:22 INFO manager.SqlManager: Executing SQL statement: SELECT     t.* FROM `test` AS t LIMIT 1
17/06/17 18:15:22 INFO orm.CompilationManager: HADOOP_HOME is /home/student    /Installations/hadoop-1.2.1/libexec/..
Note: /tmp/sqoop-student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.java     uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/06/17 18:15:24 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-    student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.java to /home/student    /Installations/hadoop-1.2.1/./test.java
org.apache.commons.io.FileExistsException: Destination '/home/student    /Installations/hadoop-1.2.1/./test.java' already exists
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2378)
at     org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
17/06/17 18:15:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-    student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.jar
17/06/17 18:15:24 WARN manager.MySQLManager: It looks like you are importing     from mysql.
17/06/17 18:15:24 WARN manager.MySQLManager: This transfer can be faster! Use     the --direct
17/06/17 18:15:24 WARN manager.MySQLManager: option to exercise a MySQL-    specific fast path.
17/06/17 18:15:24 INFO manager.MySQLManager: Setting zero DATETIME behavior     to convertToNull (mysql)
17/06/17 18:15:24 INFO mapreduce.ImportJobBase: Beginning import of test
17/06/17 18:15:27 INFO mapred.JobClient: Cleaning up the staging area     hdfs://localhost:9000/home/student/Installations/hadoop-1.2.1/data/mapred    /staging/student/.staging/job_201706171814_0001
17/06/17 18:15:27 ERROR security.UserGroupInformation:     PriviledgedActionException as:student     cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory     test already exists
17/06/17 18:15:27 ERROR tool.ImportTool: Encountered IOException running     import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output     directory test already exists
at     org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileO    utputFormat.java:137)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at     org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
at     org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

Is there any way to figure out this problem?


Solution

    • you are importing without providing target directory of hdfs. when we are not providing any target directory sqoop run import only once and create directory in hdfs with your mysql table name.

    So your query

    sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test

    this create a directory with the name test1 in hdfs

    • Just add following script

    sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test --target-dir test1

    hope fully its work fine and just refer sqoop import and all related sqoop