Search code examples
hadoopmapreducesqoop

Concurrency in Sqoop


I have read documents where it is recommended to install sqoop on edgenode for many reasons which is understood and for every mapper a connection to source database is established. My question is will all the 4 connections be established from edgenode or sqoop-client in edgenode just creates some kind of driver which monitors the ingestion while datanodes connect to the databases,get the data(part) and split it locally and then put in HDFS.


Solution

  • Sqoop is a wrapper over Map reduce to perform import export operation.

    1. Mappers will run in your cluster , while the sqoop client will run the edge node.
    2. Each mapper will open a connection to your database.
    3. What rows are consumed by your mapper are decided by the client when submitting the job.