Search code examples
regexcoldfusioncoldfusion-9

Read substring between two strings CF


In an HTML file, I need to extract the date string between two substrings.

Example string :

<A HREF="actuals_ADAPS_20150517_3.txt"></A> <A HREF="actuals_ADAPS_20150517_3.txt">actuals_ADAPS_20150517_3.t&gt;</A> May 17 00:50      4k <A HREF="actuals_ADAPS_20150518_1.txt"></A> <A HREF="actuals_ADAPS_20150518_1.txt">actuals_ADAPS_20150518_1.t&gt;</A> May 17 18:50      4k <A HREF="actuals_ADAPS_20150518_3.txt"></A> <A HREF="actuals_ADAPS_20150518_3.txt">actuals_ADAPS_20150518_3.t&gt;</A> May 18 00:50      4k

Example string represents 3 text files and their associated times.

I need to extract the times for each of these files.

I have gone through the REGEX route but haven't been able to get the correct one.

Code so far:

<cfdump var="#REMatch('actuals_METR_YOUNN_20150520_3.t&gt;(.*)4k',html)#">

The code is not right but gives an idea on where am I heading.

For filename :actuals_ADAPS_20150517_3.txt

Expected Output :
May 17 00:50

Current Output:

Rematch Output

Note: As per Leigh in the comments, ReMatch (unfortunately) returns the entire string matched, instead of just the grouped expression as you would expect.

I had to use the REReplace on top of ReMatch to get the desired output.

Thanks all for helping.


Solution

  • You are on the right track.. just make your .* greedy by adding ?.. and exclude trailing space by adding \s* after it.

    actuals_ADAPS_20150517_3\.t&gt;<\/A>\s*(.*?)\s*4k
    

    See DEMO