In an HTML file, I need to extract the date string between two substrings.
Example string :
<A HREF="actuals_ADAPS_20150517_3.txt"></A> <A HREF="actuals_ADAPS_20150517_3.txt">actuals_ADAPS_20150517_3.t></A> May 17 00:50 4k <A HREF="actuals_ADAPS_20150518_1.txt"></A> <A HREF="actuals_ADAPS_20150518_1.txt">actuals_ADAPS_20150518_1.t></A> May 17 18:50 4k <A HREF="actuals_ADAPS_20150518_3.txt"></A> <A HREF="actuals_ADAPS_20150518_3.txt">actuals_ADAPS_20150518_3.t></A> May 18 00:50 4k
Example string represents 3 text files and their associated times.
I need to extract the times for each of these files.
I have gone through the REGEX route but haven't been able to get the correct one.
Code so far:
<cfdump var="#REMatch('actuals_METR_YOUNN_20150520_3.t>(.*)4k',html)#">
The code is not right but gives an idea on where am I heading.
For filename :actuals_ADAPS_20150517_3.txt
Expected Output :
May 17 00:50
Current Output:
Note: As per Leigh in the comments, ReMatch (unfortunately) returns the entire string matched, instead of just the grouped expression as you would expect.
I had to use the REReplace
on top of ReMatch
to get the desired output.
Thanks all for helping.
You are on the right track.. just make your .*
greedy by adding ?
.. and exclude trailing space by adding \s*
after it.
actuals_ADAPS_20150517_3\.t><\/A>\s*(.*?)\s*4k
See DEMO