Search code examples
preg-match

preg_match_all - get part between pattern and string


In this example i have large string with many defined elements in it part of example string here. In this example i get matches from example file (Starting from first(~32) tilde, to ~120 3), which sloud be correct in my regex, but i need update regex so it get first closest match in reverse from ~120 3, so the result be:

PRIEDE EGLE BERZS LAPU KOKI

<?php

        $regex = '/~[1-9](.*?)\~120 3/s';
        preg_match($regex, $str, $matches);

        echo '<pre>';
        print_r($matches);
        exit();
    ?>

So the question is: How should i set direction to get part of string in "reverse"? If i match ~120 3, then i get all results from ~120 3 in reverse until i match tilde symbol+number - ~[1-9]?

Attached image of my currect regex result and marked few elements: * Green - element which i know and in my imagination - will start search in reverse. * Grey - the correct result. * Red - firest match what was found in reverse from ~120 3

enter image description here

Thanks for recommendations in advance!


Solution

  • So the question is:

    How should i set direction to get part of string in "reverse"? If i match ~120 3, then i get all results from ~120 3 in reverse until i match tilde symbol+number - ~[1-9]?

    IT is not possible to change Boost regex matching direction within the input, however, you may use lookaheads to restrict the text matched.

    Acc. to the requirements, you need

    ~[1-9]([^~]*(?:~(?![1-9])[^~]*)*)~120 3
    

    See the regex demo.

    Details:

    • ~[1-9] - your initial delimiter
    • ([^~]*(?:~(?![1-9])[^~]*)*) - Capturing group 1 matching:
      • [^~]* - any 0+ chars other than tilde
      • (?:~(?![1-9])[^~]*)* - 0+ sequences of:
        • ~(?![1-9]) - a tilde that is not followed with a digit from 1 to 9
        • [^~]* - any 0+ chars other than tilde
    • ~120 3 - end delimiter

    However, it won't capture what you need since it will include some digits and space at the start. Maybe your starting delimiter should be ~[\d\s]+ and the lookahead then should be (?![\d\s]+). See another demo.