Search code examples
phpregexparsingtext-extractioncsv

Parse/Split a forward slash delimited string


This is more of a generic regex question than a PHP-specific one.

I am given different strings that may look like:

A/B/PA ID U/C/D

And I'm trying to extract the segment in the middle slashes that has spaces ("/PA ID U") using:

preg_match('/(\/PA .+)(\/.+|$)/', $string, $matches);

However, instead of getting "/PA ID U" as I was expecting, I was getting "/PA ID U/C/D".

How can I make it prioritize matching "/.+" over "$" in that last group?


Additional notes:

I need that last group to match either another "/somethingsomthing" or "" because the string varies a lot. If I only match for the "/.+", I won't be able to get the "/PA ID U" if it's at the end of the line, such as in "A/B/PA ID U".

Basically, I need to be able to extract specific segments like so:

Given: "A/B/PA ID U/PA ID U/C/D"

Extract: (A), (B), (PA ID U), (PA ID U), (C), (D)


[UPDATE]

I'm trying to avoid using split() or explode() because that would mean that I have to match the "PA ID U" pattern separately. Aside from merely extracting the slash-separated segments, I need to validate that the substrings match specific patterns.


Solution

  • Your regular expression is not working because the .+ is being greedy. You could fix it by adding a non-greedy modifier (a ?) to your first .+ as such:

    preg_match('/(\/PA .+?)(\/.+|$)/', '', $matches);
    

    You could alternatively do:

    '/\/(PA [^\/]+)(\/.+|$)/'
    

    I moved the slash outside of the parens to avoid capturing that (I presume you're not interested in the slash). The [^\/]+ will capture any character up to the next slash.