Search code examples
phpregexstringcontain

Finding words in between other words


I have a sentence with varying lengths of words. The x in the sentence below represents random words, and the x1,x2, and x3 represent fixed words - words that don't change. Basically, x1 is x1, x2 is x2, and x3 is x3. First, I need to check whether there is a x in between x1, x2 and x3. If there is a x in between them, I need the x's value. That's it. How would I do that?

x x1 x x2 x3 x

P.S There can be more than 1 x in between x1, x2 and x3, and also can be more than 1 x to the left and right.


Solution

  • preg_match_all('/x1\\s+(.+?)\\s+x2\\s+(.+?)\\s+x3/i', $string, $matches);
    

    will put your desired content into $matches[1] (matches between x1 and x2) and $matches[2] (matches between x2 and x3). The regular expression searches for all occurences of x1 followed by whitespace, something else, whitespace, then x2, another whitespace-anything-whitespace-sequence and x3 at last.

    If you want the strings in-between as seperate words, you could do a preg_split('/\s/', ...) on them. The regular expression from above can also be adapated to that, but that would make retrieval more complicated.

    Example:

    <?php
    
    $string = 'The quick brown fox jumps over the lazy dog';
    preg_match_all('/quick\\s+(.+?)\\s+fox\\s+(.+?)\\s+lazy/', $string, $matches);
    var_dump($matches);
    
    ?>
    

    yields

    array(3) {
      [0]=>
      array(1) {
        [0]=>
        string(35) "quick brown fox jumps over the lazy"
      }
      [1]=>
      array(1) {
        [0]=>
        string(5) "brown"
      }
      [2]=>
      array(1) {
        [0]=>
        string(14) "jumps over the"
      }
    }
    

    which is imho the correct result.

    As you can see, $matches[1][0] contains the words between quick (or x1) and fox (or x2), and $matches[2][0] contains the words between fox (or x2) and lazy (or x3). If there are more occurences found, they will be stored under $matches[1][1] and $matches[2][1] and so on, counting up the second index. It will be sufficient to iterate over the indexes of $matches[0], as all result sets will contain the complete match and the two partial matches.