sql sql-server sql-server-2008 recursion patindex

SQL server 2008 patindex recursion

I want to find the latest instance of an expression, then keep looking to find there a better match and then choose the best match.

The cell I am looking at is a repeatedly apended log with notes followed by the username and timestamp.

Example cell contents:

Starting the investigation.
JWAYNE entered the notes above on 08/12/1976 12:01

Taking over the case. Not a lot of progress recently.
CEASTWOOD entered the notes above on 03/14/2001 09:04

No wonder this case is not progressing, the whole town is covering up some shenanigans!
CEASTWOOD entered the notes above on 03/21/2001 05:23

Star command was right, this investigation has been tossed around like a hot potato for a long time!
BLIGHTYEAR entered the notes above on 08/29/2659 08:01

I am not an expert on database normal form rules but it is annoying that the entries are jammed together into one cell making my job of isolating and checking the notes for specific words, especially when the cell is duplicated for multiple rows until the investigation is closed which puts the notes from future phases into the note column of past events and on top of that the time stamps making a timestamp PATINDEX with even a few minute margin unreliable like this:

CaseID, Username,  Notes,             Phase, Timestamp
E18902, JWAYNE,    Starting....08:01, E1,    03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E2,    03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E3,    03/21/2001 05:34
E18902, BLIGHTYEAR,Starting....08:01, E4,    08/29/2659 07:58

Right now I am doing a reverse on the whole string then a patindex to find the username then substringing to select only the note for that phase of the investigation and the problem is when the same user enters notes for multiple phases my simple "look for the first match staring at the end of the string moving to the top" picks up the wrong entry. My first thought is to search for the username and then check again to see if an entry further up is a better match (note time stamp vs column time stamp) but I am not sure how to code that...

Do i have to get into complicated string splits or is there a more simple solution?

Solution

Here's my suggestion. This is for one record, but you can convert it to a user-defined table-valued function, if you like.

I'm going to use the example data you had above.

 declare @sourceText nvarchar(max)
    ,    @workText   nvarchar(max)
    ,    @xml        xml

 set @sourceText = <your example text in your question>
 set @workText = @sourceText

 -- We're going to replace all the carriage returns and line feeds with 
 -- characters unlikely to appear in your text.  (If they are, use some
 -- other character.)

 set    @workText = REPLACE(@workText, char(10), '|')
 set    @workText = REPLACE(@workText, char(13), '|')

 -- Now, we're going to turn your text into XML.  Our first target is 
 -- the string of four "|" characters that the blank lines between entries
 -- will be turned into.  (If you've got 3, or 6, or blanks in between, 
 -- adjust accordingly.)

set @workText = REPLACE(@workText, '||||', '</line></entry><entry><line>')

-- Now we replace every other "|".  
set @workText = REPLACE(@workText, '|', '</line><line>')

-- Now we construct the rest of the XML and convert the variable to an 
-- actual XML variable.
set @workText = '<entry><line>' + @workText + '</line></entry>'
set @workText = REPLACE(@workText, '<line></line>','') -- Get rid of any empty nodes.

set @xml = CONVERT(xml, @workText)

We should now have an XML fragment that looks like this. (You can see it if you insert select @xml into the SQL at this point.)

<entry>
  <line>Starting the investigation.</line>
  <line>JWAYNE entered the notes above on 08/12/1976 12:01</line>
</entry>
<entry>
  <line>Taking over the case. Not a lot of progress recently.</line>
  <line>CEASTWOOD entered the notes above on 03/14/2001 09:04</line>
</entry>
<entry>
  <line>No wonder this case is not progressing, the whole town is covering up some shenanigans!</line>
  <line>CEASTWOOD entered the notes above on 03/21/2001 05:23</line>
</entry>
<entry>
  <line>Star command was right, this investigation has been tossed around like a hot potato for a long time!</line>
  <line>BLIGHTYEAR entered the notes above on 08/29/2659 08:01</line>
</entry>

We can now transform this XML into XML we like better:

  set @xml = @xml.query(
  'for $entry in /entry
    return <entry><data>
    {
    for $line in $entry/line[position() < last()] 
    return string($line)
    }
    </data>
    <timestamp>{ data($entry/line[last()]) }</timestamp>     
 </entry>
 ')

This gives us XML that looks like this (just one entry shown, for length reasons):

<entry>
    <data>Starting the investigation.</data>
    <timestamp>JWAYNE entered the notes above on 08/12/1976 12:01</timestamp>
</entry>

You can convert this back to tabular data with this query:

select  EntryData = R.lines.value('data[1]', 'nvarchar(max)')
    ,   EntryTimestamp = R.lines.value('timestamp[1]', 'nvarchar(MAX)')
from    @xml.nodes('/entry') as R(lines)

... and get data that looks like this.

And from there, you can do whatever you need to do.