Search code examples
sql-serverregexsql-server-2014capturing-group

Can I use regex capturing groups in SQL Server 2014?


I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.

E.g.
From the text

"Some title, Some Journal name, vol. 5, p. 20-22"

I want to grab the volume number

, vol\. ([0-9]+), p\. [0-9]+

Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:

  • The journal/title containing "vol.".
  • Volume numbers/pages containing letters
  • "vol" being followed by ":" or ";" instead of "."
  • ...

The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).

(^|§|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)

As far as I know, there are two ways of getting Regex functionality into SQL Server.

Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...


Solution

  • I have found a CLR implementation that is super easy to install, and includes Regex capturing group functions.

    http://www.sqlsharp.com/

    I have installed this in a separate database called 'SQL#' (simply by using the provided installation .sql script), and the functions are located inside a schema with the same name. As a result I can use the function as follows:

    select SQL#.SQL#.RegEx_CaptureGroup( 'test (2005) test', '\((20[012][0-9]|19[5-9][0-9])\)', 1, NULL, 1, -1, '');
    

    Would be nice if this was included by default in SQL Server...