Search code examples
regexcoldfusion-9

Get a string with HTML tags inside a larger string with ColdFusion regex


I'm new to regular expressions and could use some help.

I am attempting to use a ColdFusion REReplace to scrape data and get my desired content.

This is what I have so far:

<cfoutput>
#REReplace("Remove this please <p>Make this Display Please</p> Remove this please", "", "", "All")#
</cfoutput>

What regular expression could take that string and return only "Make this Display Please"?


Solution

  • In order to get a subtext from a longer string, you need to match everything up to what you need, capture what you need with a capturing group (...), and then match the rest of the string up to the end. The replacement is \1 back-reference that references the text captured by the capturing group.

    So, use

    #REReplace("Remove this please <p>Make this Display Please</p> Remove this please", ".*<p>(.*?)</p>.*", "\1", "All")#
    

    The regex matches:

    • .* - matches any character but a newline from the beginning up to the last </p>
    • <p> - the literal <p>
    • (.*?) - 0 or more characters other than newline symbol as few as possible (it means up to the closest </p> here)
    • </p> - matches literal </p>
    • .* - matches the rest of text to the end (no newlines).

    To match newlines, use [\s\S] instead of ..