I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.
E.g.
From the text
"Some title, Some Journal name, vol. 5, p. 20-22"
I want to grab the volume number
, vol\. ([0-9]+), p\. [0-9]+
Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:
The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).
(^|§|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)
As far as I know, there are two ways of getting Regex functionality into SQL Server.
Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...
I have found a CLR implementation that is super easy to install, and includes Regex capturing group functions.
I have installed this in a separate database called 'SQL#' (simply by using the provided installation .sql script), and the functions are located inside a schema with the same name. As a result I can use the function as follows:
select SQL#.SQL#.RegEx_CaptureGroup( 'test (2005) test', '\((20[012][0-9]|19[5-9][0-9])\)', 1, NULL, 1, -1, '');
Would be nice if this was included by default in SQL Server...