Am creating a wrapper for moving data from mysql to hdfs using sqoop through Oozie from shellscript.
I have an "InputFile" containing Source database, tables to be imported and the hdfs output directory from which the 'while-loop' in "Shell_Script.sh" reads it and checks for the sourceDB, if it is 'Mysql' then inside 'if-loop' it assigns the corresponding Mysql parameters to the mentioned variables in ShellScript.sh file.
I have "Sqoop_job.property file" in which I'have mentioned the Database parameters as generic from which the "Oozie workflow" gets the values for execution. In order to perform my Data-Ingestion in a generic way, I haven't mentioned the DB parameters directly in the Sqoop_job.property file.
Hence I should replace it through the "Shell_Script.sh" file. So I've tried to use "sed-command" for replacing the values of DB parameters from InputFile(SourceDB,db_name,Mysql_table,hdfsdir) and also from the assigned variables from the "if-condition"(driver,jdbcUri,UserName, Password)
On execution of "Shell_Script.sh" file my oozie job throws Error:java.lang.RuntimeException: Could not load db driver class: generic
InputFile.txt: It is a text file containing the DB details along with input and output path.
# SourceDB|db_name|Mysql_table|hdfsdir
Mysql|sqoop_check|WrapperTbl|/root/SqoopWrapper2/output1
Mysql|sqoop_check|WrapperTbl2|/root/SqoopWrapper2/output2
Sqoop_job.property: " The file that contains hadoop and mysql parameters as generic"
oozie.wf.application.path=/root/SqoopWrapper2/sqoop_workflow.xml
SourceDB=generic
db_name=generic
Mysql_table=generic
driver=generic
jdbcUri=generic
UserName=generic
Password=generic
hdfsdir=generic
SqoopScript.sh : " It is a script file on reading the input file it checks for DB source and get values from the InputFile & values assigned from 'if-loop' and creates a new Scoop_job.property file and executes the oozie workflow.xml "
# Assigned path of input file and reading it thro' while loop
input=/root/SqoopWrapper2/InputFile.txt
IFS='|'
while read -r SourceDB db_name Mysql_table hdfsdir
do
# Checks for source DB onreading the InputFile and assigns the corresponding DB parameters.
if [ "$SourceDB" = "Mysql" ]
then
driver=com.mysql.jdbc.Driver
jdbcUri=jdbc:mysql://edgenode:3306
UserName=****
Password=*****
# The values of SourceDB,db_name,Mysql_table,hdfsdir,driver,jdbcUri,UserName, Password should be replaced instead of "generic" in Scoop_job.property file and should create a new Scoop_job.property file in order to execute oozie workflow.xml as it calls the property file.
sed -e 's/SourceDB=generic/SourceDB="$SourceDB"/' > sqoop_job.properties
sed -e 's/db_name=generic/db_name="$db_name"/' > sqoop_job.properties
sed -e 's/Mysql_table=generic/Mysql_table="$Mysql_table"/' > sqoop_job.properties
sed -e 's/hdfsdir=generic/hdfsdir="$hdfsdir"/' > sqoop_job.properties
sed -e 's/jdbcUri=generic/jdbcUri="$jdbcUri"/' > sqoop_job.properties
sed -e 's/driver=generic/driver="$driver"/' > sqoop_job.properties
sed -e 's/UserName=generic/UserName="$Username"/' > sqoop_job.properties
sed -e 's/Password=generic/Password="$Password"/' > sqoop_job.properties
fi
# Oozie job executes on calling the sqoop_job.property file.
ooziejob=$(oozie job -oozie http://oozieserver/oozie -config /root/SqoopWrapper2/prop/sqoop_job.properties -run);
echo $ooziejob;
done < "$input"
exit 0
Oozie worflow.xml: "consist of sqoop actions to import data from mysql table to hdfs directory by getting values from Sqoop_job.property file"
<workflow-app name="sqoop-wf" xmlns="uri:oozie:workflow:0.4">
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
<start to ="SqoopJob" />
<action name="SqoopJob">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<arg> import </arg>
<arg> --connect </arg>
<arg> ${jdbcUri} </arg>
<arg> --username </arg>
<arg> ${UserName}</arg>
<arg> --password </arg>
<arg> ${Password}</arg>
<arg> --table </arg>
<arg> ${Mysql_table} </arg>
<arg> --target-dir </arg>
<arg> ${hdfsdir} </arg>
<arg> --driver </arg>
<arg> &{driver} </arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end" />
<error to="kill"/>
</action>
<kill name='kill'>
<message>Something went wrong</message>
</kill>
<end name='end' />
</workflow-app>
There are several issues here:
1- You are missing the last sed
delimiter "|".
For example:
sed -e 's|SourceDB=generic|sourceDB=$SourceDB|'
2- You have to tell sed
where you want to apply the command (file or variable), for example:
sed -e 's/SourceDB=generic/sourceDB=$SourceDB/' filename.txt
or
echo $variable | sed -e 's/SourceDB=generic/sourceDB=$SourceDB/'
3- With ">" operator, you will be always overwriting your output file. If you want to append new lines you must use ">>". For example:
sed -e 's/SourceDB=generic/sourceDB=$SourceDB/' filename.txt >> outputfile.txt