There is a table with columns word and sentence. I am trying to replace the words in a sentence if the word exits in "words" column.
Tried the below code but it works only for single word. But I need to replace multiple terms if it exits in the word column.
Create table temp(id NUMBER,
word VARCHAR2(1000),
Sentence VARCHAR2(2000));
insert into temp(1,'automation testing','automation testing is popular kind of testing');
insert into temp(2,'testing','manual testing');
insert into temp(3,'manual testing','this is an old method of testing');
BEGIN
for t1 in (select id, word from temp)
LOOP
for t2 in (select rownum from temp where sentence is not null)
LOOP
update temp
set sentence = REPLACE(sentence, t1.word,t1.id)
where rownum = rownum;
END LOOP;
END LOOP;
END;
But I need to replace multiple terms if it exits in the word column.
Expected outcome:
id word sentence
1 automation testing 1 is popular kind of 2
2 testing 3
3 manual testing this is an old method of 2
Updated code:
MERGE INTO temp dst
USING (
WITH ordered_words ( rn, id, word, regex_safe_word ) AS (
SELECT ROW_NUMBER() OVER ( ORDER BY LENGTH( word ) ASC, word DESC ),
id,
word,
REGEXP_REPLACE( word, '([][)(}{|^$\.*+?])', '\\\1' )
FROM temp
),
sentences_with_ids ( rid, sentence, rn ) AS (
SELECT ROWID,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM temp
UNION ALL
SELECT s.rid,
REGEXP_REPLACE(
REGEXP_REPLACE(
s.sentence,
'(^|\W)' || w.regex_safe_word || '($|\W)',
'\1${'|| w.id ||'}\2'
),
'(^|\W)' || w.regex_safe_word || '($|\W)',
'\1${' || w.id || '}\2'
),
s.rn - 1
FROM sentences_with_ids s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
),
sentences_with_words ( rid, sentence, rn ) AS (
SELECT rid,
sentence,
( SELECT COUNT(*) + 1 FROM ordered_words )
FROM sentences_with_ids
WHERE rn = 1
UNION ALL
SELECT s.rid,
REPLACE(
s.sentence,
'${' || w.id || '}',
'http://localhost/' || w.id || '/<u>' || w.word || '</u>'
),
s.rn - 1
FROM sentences_with_words s
INNER JOIN ordered_words w
ON ( s.rn - 1 = w.rn )
)
SELECT rid, sentence
FROM sentences_with_words
WHERE rn = 1
) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE
SET sentence = src.sentence;
Can we improve the performance of the above updated query?
2019-09-13 13:51 UTC - I see what you are getting at now. You want to replace not strings but words, and the new string will include the word(s). So you are doing a first series of replacements, then a second series.
To make this go faster, I would still do a JOIN of the words to the sentences in which they are found. Here is the SELECT part of the solution (the bit that goes into the USING clause) so you can check what goes on and see some results quicker. Some words of explanation after the code.
with words(id, word, word_length, search1, replace1, search2, replace2) as (
select id, word, length(word),
'(^|\W)' || REGEXP_REPLACE(word, '([][)(}{|^$\.*+?])', '\\\1') || '($|\W)',
'\1{'|| id ||'}\2',
'{'|| id ||'}',
'http://localhost/' || id || '/<u>' || word || '</u>'
FROM temp
)
, joined_data as (
select w.search1, w.replace1, w.search2, w.replace2,
s.rowid s_rid, s.sentence,
row_number() over(partition by s.rowid order by word_length desc) rn
from words w
join temp s
on instr(s.sentence, w.word) > 0
and regexp_like(s.sentence, w.search1)
)
, unpivoted_data as (
select S_RID, SENTENCE, PHASE, SEARCH_STRING, REPLACE_STRING,
row_number() over(partition by s_rid order by phase, rn) rn,
case when row_number() over(partition by s_rid order by phase, rn)
= count(*) over(partition by s_rid)
then 1
else 0
end is_last
from joined_data
unpivot(
(search_string, replace_string)
for phase in ( (search1, replace1) as 1, (search2, replace2) as 2 ))
)
, replaced_data(S_RID, RN, is_last, SENTENCE) as (
select S_RID, RN, is_last,
regexp_replace(SENTENCE, search_string, replace_string)
from unpivoted_data
where rn = 1
union all
select n.S_RID, n.RN, n.is_last,
case when n.phase = 1
then regexp_replace(o.SENTENCE, n.search_string, n.replace_string)
else replace(o.SENTENCE, n.search_string, n.replace_string)
end
from unpivoted_data n
join replaced_data o
on o.s_rid = n.s_rid and n.rn = o.rn + 1
)
select s_rid, sentence from replaced_data
where is_last = 1
order by s_rid;
The WORDS subquery gets the word length and the "search" and "replace" strings for both series of replacements.
JOINED_DATA joins the words to the sentences. I do INSTR first and do REGEXP only when needed because it costs more CPU.
UNPIVOTED_DATA splits the rows into two phases: the first replaces "automation testing" by "{1}" and the second replaces "{1}" by "http://localhost/1/automation testing". The rows are assigned the right sequence, and the "last" row for each sentence is identified.
REPLACED_DATA does either REGEXP_REPLACE or REPLACE depending on the phase. In the second phase, REPLACE is enough.
Regards, Stew