Search code examples
linuxshellsedworkflowoozie

updated shell script code as suggested: Need to know command for "sed" in replacing values


Am creating a wrapper for moving data from mysql to hdfs using sqoop through Oozie from shellscript.

I have an "InputFile" containing Source database, tables to be imported and the hdfs output directory from which the 'while-loop' in "Shell_Script.sh" reads it and checks for the sourceDB, if it is 'Mysql' then inside 'if-loop' it assigns the corresponding Mysql parameters to the mentioned variables in ShellScript.sh file.

I have "Sqoop_job.property file" in which I'have mentioned the Database parameters as generic from which the "Oozie workflow" gets the values for execution. In order to perform my Data-Ingestion in a generic way, I haven't mentioned the DB parameters directly in the Sqoop_job.property file.

Hence I should replace it through the "Shell_Script.sh" file. So I've tried to use "sed-command" for replacing the values of DB parameters from InputFile(SourceDB,db_name,Mysql_table,hdfsdir) and also from the assigned variables from the "if-condition"(driver,jdbcUri,UserName, Password)

On execution of "Shell_Script.sh" file my oozie job throws Error:java.lang.RuntimeException: Could not load db driver class: generic

InputFile.txt: It is a text file containing the DB details along with input and output path.

    # SourceDB|db_name|Mysql_table|hdfsdir
    Mysql|sqoop_check|WrapperTbl|/root/SqoopWrapper2/output1
    Mysql|sqoop_check|WrapperTbl2|/root/SqoopWrapper2/output2

Sqoop_job.property: " The file that contains hadoop and mysql parameters as generic"

oozie.wf.application.path=/root/SqoopWrapper2/sqoop_workflow.xml
SourceDB=generic
db_name=generic
Mysql_table=generic
driver=generic
jdbcUri=generic
UserName=generic
Password=generic
hdfsdir=generic   

SqoopScript.sh : " It is a script file on reading the input file it checks for DB source and get values from the InputFile & values assigned from 'if-loop' and creates a new Scoop_job.property file and executes the oozie workflow.xml "

# Assigned path of input file and reading it thro' while loop

  input=/root/SqoopWrapper2/InputFile.txt

IFS='|'

while read -r SourceDB db_name Mysql_table hdfsdir 

do

# Checks for source DB onreading the InputFile and assigns the corresponding DB parameters.

    if [ "$SourceDB" = "Mysql" ]
        then
        driver=com.mysql.jdbc.Driver
        jdbcUri=jdbc:mysql://edgenode:3306
        UserName=****
        Password=*****

# The values of SourceDB,db_name,Mysql_table,hdfsdir,driver,jdbcUri,UserName, Password should be replaced instead of "generic" in Scoop_job.property file and should create a new Scoop_job.property file in order to execute oozie workflow.xml as it calls the property file.

        sed -e 's/SourceDB=generic/SourceDB="$SourceDB"/' > sqoop_job.properties
        sed -e 's/db_name=generic/db_name="$db_name"/' > sqoop_job.properties
        sed -e 's/Mysql_table=generic/Mysql_table="$Mysql_table"/'  > sqoop_job.properties
        sed -e 's/hdfsdir=generic/hdfsdir="$hdfsdir"/'  > sqoop_job.properties

        sed -e 's/jdbcUri=generic/jdbcUri="$jdbcUri"/'  > sqoop_job.properties
        sed -e 's/driver=generic/driver="$driver"/' > sqoop_job.properties
        sed -e 's/UserName=generic/UserName="$Username"/' > sqoop_job.properties
        sed -e 's/Password=generic/Password="$Password"/' > sqoop_job.properties

        fi

# Oozie job executes on calling the sqoop_job.property file.

ooziejob=$(oozie job -oozie http://oozieserver/oozie -config /root/SqoopWrapper2/prop/sqoop_job.properties -run);

echo $ooziejob;

done < "$input"

exit 0

Oozie worflow.xml: "consist of sqoop actions to import data from mysql table to hdfs directory by getting values from Sqoop_job.property file"

    <workflow-app name="sqoop-wf" xmlns="uri:oozie:workflow:0.4">

    <global>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
    </global>
    <start to ="SqoopJob" />
    <action name="SqoopJob">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>  


    <arg>  import     </arg>
<arg>  --connect  </arg>
<arg>  ${jdbcUri} </arg>
<arg>  --username </arg>
<arg>  ${UserName}</arg>
<arg>  --password </arg>
<arg>  ${Password}</arg>
<arg>  --table    </arg>
<arg>  ${Mysql_table}    </arg>
<arg>  --target-dir </arg>
<arg>  ${hdfsdir}    </arg>
<arg>  --driver     </arg>
<arg>  &{driver}  </arg>
<arg>-m</arg>
<arg>1</arg>

    </sqoop>
        <ok to="end" />
        <error to="kill"/>
    </action>
   <kill name='kill'>
        <message>Something went wrong</message>

    </kill>

    <end name='end' /> 
 </workflow-app>

Solution

  • There are several issues here:

    1- You are missing the last sed delimiter "|".

    For example:

    sed -e 's|SourceDB=generic|sourceDB=$SourceDB|'
    

    2- You have to tell sed where you want to apply the command (file or variable), for example:

    sed -e 's/SourceDB=generic/sourceDB=$SourceDB/' filename.txt
    

    or

    echo $variable | sed -e 's/SourceDB=generic/sourceDB=$SourceDB/'
    

    3- With ">" operator, you will be always overwriting your output file. If you want to append new lines you must use ">>". For example:

    sed -e 's/SourceDB=generic/sourceDB=$SourceDB/' filename.txt >> outputfile.txt