Good morning,
I am hoping to find assistance with writing a select query to remove some text from a column.
I have created a column called "TEXT_MINING" in a previous query, that some code a different developer wrote will perform some text mining analysis on. The TEXT_MINING column has text that looks like:
EMPLOYEE FOUND BROKEN HANDLE ON HAMMER * 02-08-18 15:19:22 PM * I found a hammer that had the wood split on the handle, tossed into scrap.
I want to remove the * and all of the text in between the two * to help my software engineer do some text mining. Here is my current dilemma:
Not only do I not know how to use REGEXP_REPLACE, but I can't get the REGEXP worked out. I currently have:
^[*]\w[*]$
So it looks like:
REGEXP_REPLACE(col, '^[*]\w[*]$', '')
Could anyone advise?
Thank you!
You may use this approach to remove 1+ occurrences of *...*
substrings in your column:
SELECT REGEXP_REPLACE(
'EMPLOYEE FOUND BROKEN HANDLE ON HAMMER * 02-08-18 15:19:22 PM * I found a hammer that had the wood split on the handle, tossed into scrap.',
'\s*\*[^*]*\*',
''
) as Result from dual
See the online demo
Pattern details
\s*
- 0+ whitespaces\*
- a *
char[^*]*
- 0+ chars other than *
\*
- a *
char.See the regex demo.