To replicate the data from my DB to S3 I am using below command.
sqoop import -D mapreduce.job.name= xxx-D mapred.job.queue.name=user -Dhadoop.security.credential.provider.path=<path> -Dfs.s3a.server-side-encryption-algorithm=<xx>--options-file <path> --query "select col1,ID,UPDATETIME from db.table where UPDATETIME between to_date('2015-09-11 00:00:00','yyyy/mm/dd hh24:mi:ss') and to_date('2018-05-24 04:28:16','yyyy/mm/dd hh24:mi:ss') and \$CONDITIONS" --hive-delims-replacement ' ' --direct --m 1 --split-by ID --target-dir <s3//path>
I am able to replicate the data but I need to get the count of processed data from the same command without using other commands like eval. Because meanwhile other records may got ingested into the source.
What I want is to capture this record count:
18/05/21 22:55:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 47.9229 seconds (0 bytes/sec)
18/05/21 22:55:55 INFO mapreduce.ImportJobBase: Retrieved 33372 records.
I have found way for the above query. when you pass your sqoop command in below
subprocess
program and use .communicate to store the whole output along with warning and info messages.
sqoop_command ='sqoop import........'
process = subprocess.Popen(sqoop_command , stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
rec_str = process.communicate()
rec_str contains the output.