In Oracle, I have a database CLOB field I need to extract data from.
The specification looks like:
[<br /><b>A:</b><br />Text A which does not contain HTML[<br />]]
[<br /><b>B:</b><br />Text B which does not contain HTML[<br />]]
[<br /><b>C:</b><br />Text C which does not contain HTML[<br />]]
Any of the three may be missing or in any order. There is not always a newlines separating the fields, and the fields can contain newlines.
I need:
<br /><b>
, but a better regex would be awesome.Examples:
field | Valid | A | B | C |
---|---|---|---|---|
<br /><b>A:</b><br />Foo<br /> <br /><b>B:</b><br />Bar<br /> <br /><b>C:</b><br />Baz<br /> |
Yes | Foo | Bar | Baz |
<br /><b>A:</b><br />Foo |
Yes | Foo | ||
<br /><b>B:</b><br />Bar<br /> |
Yes | Bar | ||
<br /><b>A:</b><br />Foo <br /><b>B:</b><br />Bar<br /> |
Yes | Foo | Bar | |
<br /><b>A:</b><br />Foo<br /> <br /><b>C:</b><br />Baz<br /> |
Yes | Foo | Baz |
Are there any regex gurus who might be able to tell me if/how I could extract A, B, and/or C?
Thanks!
Edit: I've added a SQLFiddle at http://sqlfiddle.com/#!4/9aae2/14/0
I asked some friends on Facebook as well.
The expression that did what I wanted most -- just to return the values of A, B, and C -- was:
(.*<b>[ABC]:<.b>)(<br .>)?([^<]*)
using the parameters
([^<]*)
See SQL Fiddle here.
Thank you to pandachelion and spinjector for helping, and my friend Todd Zimnoch for the regexp (and parameters) that did what I wanted most.