Search code examples
regexwindowsbashsedsql-scripts

It is a way to modify an SQL Insert Script with SED Command in order to insert word DATE before date value?


Modifying SQL Scripts with SED Command

I would like to explain my task. I have been asked to modify many files with one command without alter the data. The core is to properly insert DATE_B values, since it would give us an error trying to insert this into MYTABLE, due to the fact we create DATE_B as DATE datatype.

My Scripts

Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),'2022-08-06');

I have to modify all DATE_B values in order to insert DATE before the value. What I mean is:

Expected Scritps

Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),DATE'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),DATE'2022-08-06');

My approach was to use REGEX in order to identify the single quote, year, dash, month, dash, day, dash, single quote... But if I do this I will substitute also my properly formatted to_timestamp values.

I found this command from GeekForGeeks useful since we catch the first letter and we enclose them with parenthesis.

$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'

Trying to approach my goal:

$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/(\b[A-Z])/DATE\1/g'

Using regex:

$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/'('\d{4}-)'/DATE\1/g'
$ echo "Welcome To The Geek Stuff" ,'1999-10-10' | sed 's/^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$/'DATE\1'/g'

I also read about h, H subcommand of sed, but it is not clear enough for me to make it works.

I would appreciate if someone could explain how to properly implement sed h subcommand and how to properly use regex with sed in order to identify COMMA, single quote, year, dash, month, dash, day, dash, single quote. Save this and insert word DATE

Thank you beforehand!


Solution

  • Assumptions/Understandings:

    • the column name isn't important; what's important are strings of the form 'YYYY-MM-DD'
    • these strings can occur anywhere in the values (...) clause; this means ...
    • we need to address strings at the beginning of the values (...) clause (ie, has a preceding () as well as strings elsewhere in the values(...) clause (ie, has a preceding ,) but ...
    • also need to refrain from adding the DATE prefix if the string is embedded within a function (eg, to_timestamp('YYYY-MM-DD'))
    • assume all INSERT statements are formatted as shown in the sample (eg, a space between values and (; no white space before a 'YYYY-MM-DD' string)

    Adding a few scenarios to our input:

    $ cat insert.sql
    Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
    Insert into MYTABLE (DATE_A, DATE_C) values ('2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
    Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values ('2022-08-11','2344','2022-08-06');
    

    One sed idea:

    sed -E "s/( \(|,)('[0-9]{4}-[0-9]{2}-[0-9]{2}')/\1DATE\2/g" insert.sql
    

    Where:

    • -E - enable extended regex support (eliminates need to escape parens and braces)
    • "s/..../g" - wrap sed script in double quotes to eliminate need to escape single quote within the script
    • ( \(|,) - (1st capture group) space + ( or a comma (in this case we need the escape to designate the ( as part of the data)
    • ('[0-9]{4}-[0-9]{2}-[0-9]{2}') - (2nd capture group) our string 'YYYY-MM-DD'
    • \1DATE\2 - inserting the DATE string between the two capture groups

    This generates:

    Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
    Insert into MYTABLE (DATE_A, DATE_C) values (DATE'2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
    Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values (DATE'2022-08-11','2344',DATE'2022-08-06');
    

    Once satisfied with the results OP can add the -i flag to have sed overwrite the input file.