Search code examples
regexmatchhtml-parsingpreg-match

RegEx: Match Linefeeds between two tags (or alternatively two asterisks)


I hope you can help me. I have the following string:

<floatingText>After Cesar had reported this, he \r\n
 jumped into his UFO and flew to space</floatingText>

I only want to match the \r\n between the tags, because I want to remove the linefeed (substitute them with '') and therefore I need to match them with a regular expression. Instead of <floatingText></floatingText> I could also use asterisks* as delimiter, e.g.

*After Cesar had reported this, he \r\n
 jumped into his UFO and flew to space*
Story goes on ... with some text.
*Another section with \r\n\
a linefeed within asterisks*

The following I've tried:

\*([\s\S]*?)\* matches everything between *, including the * until the next * occurs. https://regexr.com/5hmnm

(\*)[^<]*(\*) matches everything between * including, even if there is a section between (Story goes on ...), which is not what I want. https://regexr.com/5hmoe

I hope you can help me out, I'd be super thankful.


Solution

  • There are several options, but the more versatile is to match all text between the delimiters and then pass the match to the replacement function where you can remove all CR and LFs:

    $text = "*After Cesar had reported this, he\r\n jumped into his UFO and flew to space*\r\nStory goes on ... with some text.\r\n*Another section with\r\na linefeed within asterisks*";
    echo preg_replace_callback('~\*[^*]*\*~', function($x) { return str_replace(["\r","\n"], '', $x[0]); }, $text);
    => *After Cesar had reported this, he jumped into his UFO and flew to space*
    Story goes on ... with some text.
    *Another section witha linefeed within asterisks*
    

    See the PHP demo.

    When you have tags, you just need to use either of

    '~<floatingText>.*?</floatingText>~s'
    '~<floatingText>[^<]*(?:<(?!/?floatingText>)[^<]*)*</floatingText>~'
    

    as the regex pattern.