Search code examples
mysqljoinsqoop

sqoop import query gives duplicate name error


I have three tables in mysql named movie, moviegenre, genre. When i try to import them into hdfs using sqoop free-form query:

sqoop import --connect jdbc:mysql://localhost/movielens --username user --password *** 
--query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name 
from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid 
WHERE $CONDITIONS' --split-by m.id --target-dir /user/sqoop/moviegenre

it throws error:

Imported Failed: Duplicate Column identifier specified: 'name'.

When I write the same query in mysql it gives me output, which is what I want:

id       movie_name         year   genre_name
 1       Toy Story          1995   Animation 
 2       Jumanji            1995   Adventure
 ..      .......            ....   ........  

I followed this link and did exactly as answer : Imported Failed: Duplicate Column identifier specified (sqoop) but that didn't seem to help either.

Fields in the tables are as follow : movie = id, name, year

genre = id, name

moviegenre = movieid, genreid

Please point out the mistake in my query.


Solution

  • There aren't any errors in the SQOOP command. I just created the tables in my cloudera quick start VM and ran the SQOOP import. It worked fine and produced the results. It's likely that you have run the command before adding the aliases. The other difference is that I have formatted the command.

    In case you needed it, below is what I ran

    MySQL Command

    mysql> use retail_db;
    mysql> create table movie (id integer, name varchar(100), year integer);
    mysql> insert into movie values (1, 'TEST', 2016);
    mysql> create table genre (id integer, name varchar(100));
    mysql> insert into genre values (1, 'TEST');
    mysql> create table moviegenre (movieid integer, genreid integer);
    mysql> insert into moviegenre values (1, 1);
    

    SQOOP Command

    sqoop import \
    --connect jdbc:mysql://localhost/retail_db \
    --username root \
    --password cloudera \
    --query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE $CONDITIONS' --split-by m.id --target-dir /user/cloudera/moviegenre
    

    SQOOP Standard output

    [cloudera@quickstart ~]$ sqoop import \
    > --connect jdbc:mysql://localhost/retail_db \
    > --username root \
    > --password cloudera \
    > --query 'select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE $CONDITIONS' --split-by m.id --target-dir /user/cloudera/moviegenre
    Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    17/11/19 13:08:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0
    17/11/19 13:08:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/11/19 13:08:01 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    17/11/19 13:08:01 INFO tool.CodeGenTool: Beginning code generation
    17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE  (1 = 0) 
    17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE  (1 = 0) 
    17/11/19 13:08:02 INFO manager.SqlManager: Executing SQL statement: select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE  (1 = 0) 
    17/11/19 13:08:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
    Note: /tmp/sqoop-cloudera/compile/3b35f51458e53da94c6852dcfc0b904a/QueryResult.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    17/11/19 13:08:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/3b35f51458e53da94c6852dcfc0b904a/QueryResult.jar
    17/11/19 13:08:04 INFO mapreduce.ImportJobBase: Beginning query import.
    17/11/19 13:08:04 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    17/11/19 13:08:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/11/19 13:08:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/11/19 13:08:05 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    17/11/19 13:08:08 WARN hdfs.DFSClient: Caught exception 
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1281)
        at java.lang.Thread.join(Thread.java:1355)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:951)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:689)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:878)
    17/11/19 13:08:08 INFO db.DBInputFormat: Using read commited transaction isolation
    17/11/19 13:08:08 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(t1.id), MAX(t1.id) FROM (select m.id as id,m.name as movie_name,m.year as year, g.name as genre_name from movie m join moviegenre mg on m.id = mg.movieid join genre g on g.id = mg.genreid WHERE  (1 = 1) ) AS t1
    17/11/19 13:08:08 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1 to: 1
    17/11/19 13:08:08 INFO mapreduce.JobSubmitter: number of splits:1
    17/11/19 13:08:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510865312807_0011
    17/11/19 13:08:09 INFO impl.YarnClientImpl: Submitted application application_1510865312807_0011
    17/11/19 13:08:09 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1510865312807_0011/
    17/11/19 13:08:09 INFO mapreduce.Job: Running job: job_1510865312807_0011
    17/11/19 13:08:33 INFO mapreduce.Job: Job job_1510865312807_0011 running in uber mode : false
    17/11/19 13:08:33 INFO mapreduce.Job:  map 0% reduce 0%
    17/11/19 13:08:52 INFO mapreduce.Job:  map 100% reduce 0%
    17/11/19 13:08:52 INFO mapreduce.Job: Job job_1510865312807_0011 completed successfully
    17/11/19 13:08:52 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=148008
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=99
            HDFS: Number of bytes written=17
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=16886
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=16886
            Total vcore-seconds taken by all map tasks=16886
            Total megabyte-seconds taken by all map tasks=17291264
        Map-Reduce Framework
            Map input records=1
            Map output records=1
            Input split bytes=99
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=702
            CPU time spent (ms)=2330
            Physical memory (bytes) snapshot=231743488
            Virtual memory (bytes) snapshot=1567064064
            Total committed heap usage (bytes)=221249536
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=17
    17/11/19 13:08:52 INFO mapreduce.ImportJobBase: Transferred 17 bytes in 46.7631 seconds (0.3635 bytes/sec)
    17/11/19 13:08:52 INFO mapreduce.ImportJobBase: Retrieved 1 records.