Search code examples
hadoopsqoopamazon-emrsqoop2

How to increase no of mapper in the Sqoop job


I am trying to load data from S3 to RDS using Sqoop . I have approx 35 GB gzip files in 70 different file .

Here is my command which i run to do that

sqoop export 
 --connect jdbc:mysql://a205067-pppp-ec2rds.abcd.us-east-1.rds.amazonaws.com/tprdb 
 --username user 
 --password password 
 --table DnB_WB_UniverseMaster 
 --export-dir s3://pppp-sukesh/FullFiles/ 
 --fields-terminated-by  '|' 
 --num-mappers 500 
 --direct 
 --default-character-set=latin1

dunsnumber is my primary key

The issue is export is very very slow and the no of mapper i can see is only 4 .

What opymization i can do here in order to make load faster .

Also i have EMR cluster with 10 m4.large


Solution

  • Try to use only one dash with m argument: -m 20 or --num-mappers 20.