Search code examples
sqloracle-databasesql-loaderregexp-replace

Load data from this file. (load as EXCEL or CSV)


EmpId      Name
----------------------                                                                                 
E0008      Mary _Jane
EMP0009    PeterParker
10         Jack roger
EMP 0012   Mark Todd
EMP0011    John Smith
EMP13      *Peter Parker

Need to satisfy below conditions

  1. Load data from this file (load as EXCEL or CSV)

  2. While loading data create multiple batches

  3. While loading data, apply below rules

    • (a) Remove extra spaces in Name column and replace with single space
    • (b) Where there is no space between First and lastname add a space
    • (c) In name, remove any special characters
    • (d) In name, Change to title case
    • On EmpID remove all special characters and EmpID should be in EMP0011 format.

My control is below

load data

infile 'E:\A\Book2.csv'

TRUNCATE

PRESERVE BLANKS

into table dummy

fields terminated by ',' 

optionally enclosed by '"'AND'"'

(

 EMPID "CASE WHEN SUBSTR(:EMPID,1,1)='E' THEN 'EMP00'||SUBSTR(:EMPID,-2) ELSE 
'EMP00'||SUBSTR(:EMPID,-2) END",

 NAME "INITCAP(REGEXP_REPLACE(:NAME,'[^A-Z a-z]'))"

)

I'm unable to satisfy conditions 2, 3b, 3d


Solution

  • Have a look at the following example.

    Control file:

    load data 
    infile *
    replace
    into table dummy
    fields terminated by ";" 
    trailing nullcols
    (
    empid "'EMP' || lpad(regexp_substr(:empid, '[[:digit:]]+$'), 4, '0')",
    name  "trim(initcap(regexp_replace(regexp_replace(regexp_replace(:name, '[^[:alnum:] ]'), '([[:upper:]])', ' \\1'), ' +', ' ')))"
    )
    
    begindata
    E0008;Mary _Jane
    EMP0009;PeterParker
    10;Jack roger
    EMP 0012;Mark  Todd
    EMP0011;John Smith
    EMP13;*Peter Parker
    

    Loading session and the result:

    SQL> $sqlldr scott/tiger@kc11gt control=test32.ctl log=test32.log
    
    SQL*Loader: Release 11.2.0.1.0 - Production on Uto Ruj 1 10:18:01 2020
    
    Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
    
    Commit point reached - logical record count 5
    Commit point reached - logical record count 6
    
    SQL> select * from dummy;
    
    EMPID      NAME
    ---------- --------------------
    EMP0008    Mary Jane
    EMP0009    Peter Parker
    EMP0010    Jack Roger
    EMP0012    Mark Todd
    EMP0011    John Smith
    EMP0013    Peter Parker
    
    6 rows selected.
    
    SQL>
    

    What does the control file do?

    For EMPID: as it must begin with EMP, it is set to be a constant. Digits within input data are left padded with zeros up to 4 characters in length.

    For NAME:

    • the innermost regexp [^[:alnum:] ] removes anything but alphanumerics and spaces
    • one level up regexp '([[:upper:]])', ' \\1' adds space in front of every uppercase letter
    • one more level up regexp ' +', ' ' replaces multiple spaces with a single space
    • INITCAP capitalizes first letters in every word
    • TRIM removes leading (and trailing) spaces