Search code examples
regexalteryx

RegEx: Remove pattern and everything after it


I have strings with tags <p> and </p> I want to just get everything inbetween the tags but not the tags themselves.

I have gotten one half of the RegEx to work: ^[^_]*<p> This gives me the beginning but I still need another RegEx to get rid of </p>.


Solution

  • using lookbehind and lookahead to keep tags out of match and using /s modifier so that . matches also newlines

    (?<=<p>).*?(?=</p>)
    

    otherwise without /s modifier

    (?<=<p>)[\s\S]*?(?=</p>)
    

    because if perl can be shortended using \K, still to keep out of match

    <p>\K.*?(?=</p>)