xml oracle-database clob xmltype xmltable

CLOB Commits not working when run back to back

We are running an older Oracle Server, 10.1.0.5.

I have some simple code replaces one CLOB with another in a different column. I am replacing specific strings with another simpler string in a different column, so I can perform some basic XML Parsing. Ref: here.

I would expect to do a series of update commands, with commits in between each, and have all of the fields updated properly. If I run it as follows, only the last query gets committed, despite running as a script. The intermittent commits don't seem to take.

I can get it to work, by running each update + commit line by line, but I am wondering why I can't run these all at once. I have only shown the first 2 updates. What am I missing?

set define off

update TABLE1 SET COL_CLEANED = replace(COL_ORIGINAL,'&#xA0;',  ' ');
COMMIT;

update TABLE1 SET COL_CLEANED = replace(COL_ORIGINAL,'&#xA7;',  'SECT.');
COMMIT;

update TABLE1 ...

Thank you,

sse

Solution

As @OldProgrammer said, you're repeatedly replacing the COL_CLEANED value and overwriting earlier changes...

Imagine you started with a simple value that required multiple replacements:

create table table1 (col_original clob, col_cleaned clob);

insert into table1 (col_original)
values (to_clob('Test&#xA0;without&#xA0;breaks - &#xA7;1'));

and did the first update:

update TABLE1 SET COL_CLEANED = replace(COL_ORIGINAL,'&#xA0;',  ' ');

select col_original, col_cleaned from table1;

COL_ORIGINAL                             COL_CLEANED                             
---------------------------------------- ----------------------------------------
Test&#xA0;without&#xA0;breaks - &#xA7;1  Test without breaks - &#xA7;1

Now you do the second update:

update TABLE1 SET COL_CLEANED = replace(COL_ORIGINAL,'&#xA7;',  'SECT.');

select col_original, col_cleaned from table1;

COL_ORIGINAL                             COL_CLEANED                             
---------------------------------------- ----------------------------------------
Test&#xA0;without&#xA0;breaks - &#xA7;1  Test&#xA0;without&#xA0;breaks - SECT.1

There has been no commit and no rollback. But you took the original value - which still had both the old patterns - and replaced the second pattern in that. The first update has simply been overwritten by the second.

You need to apply successive updates to the cleaned value; only the first update uses the original value:

-- first update is based on COL_ORIGINAL
update TABLE1 SET COL_CLEANED = replace(COL_ORIGINAL,'&#xA0;',  ' ');

select col_original, col_cleaned from table1;

COL_ORIGINAL                             COL_CLEANED                             
---------------------------------------- ----------------------------------------
Test&#xA0;without&#xA0;breaks - &#xA7;1  Test without breaks - &#xA7;1           

-- subsequent updates are based on COL_CLEANED to keep earlier changes    
update TABLE1 SET COL_CLEANED = replace(COL_CLEANED,'&#xA7;',  'SECT.');
--------------------------------------------^^^^^^^

select col_original, col_cleaned from table1;

COL_ORIGINAL                             COL_CLEANED                             
---------------------------------------- ----------------------------------------
Test&#xA0;without&#xA0;breaks - &#xA7;1  Test without breaks - SECT.1

The commits aren't really needed, incidentally; you should commit once per logical transaction, not once per statement.

Incidentally, you could use the utl_i18n.unescape_reference() function for this, but in your example it will give you a mutibyte 'section' character rather than the string 'SECT.':

update TABLE1 SET COL_CLEANED = UTL_I18N.UNESCAPE_REFERENCE(COL_ORIGINAL);

select col_original, col_cleaned from table1;

COL_ORIGINAL                             COL_CLEANED                             
---------------------------------------- ----------------------------------------
Test&#xA0;without&#xA0;breaks - &#xA7;1  Test without breaks - §1

and it may make other changes differently to your other hard-coded replacements. On the other hand, at some point your col_original has probably been passed through a similar function to escape references to the values you currently have stored; as the real original text would have had § for that to have been escaped to §, it may be more appropriate to reinstate it to that original character anyway. Unless you're ending up in a character set that can't display it, perhaps.