Search code examples
phpregexpcre

Regex for substring of comma separated list


I'm a beginner to regex, so I apologize in advance if this is a naive question!

I have a string with two values separated by a comma: 12.345678,23.45678901

I am trying to use regex (this is a requirement) to return the first value with 3 decimals 12.345 and the second value with 2 decimals 23.45.

Ideally, the full regex match would be 12.345,23.45

I am able to get the first value 12.345 using the following regex: ^\d+\.\d{0,3}.

This works well because it only returns the full match (there is no Group 1 match). But I'm pretty stumped on how to get the second value 23.45 to be returned in the same string.

I've also tried this regex: (^.{0,6})(?:.*)(,)(.{0,5}), which correctly parses the first and second values, but the full match is being returned with too many decimals.

Full match: 12.345678,23.45

Group 1: 12.345

Group 2: ,

Group 3: 23.45

Any suggestions are welcome! Thank you in advance.


Solution

  • You can use this regex to get your data:

    ^(\d+\.\d{3})\d*,(\d+\.\d{2})\d*$
    

    It looks for digits followed by . and 3 decimal places (first capture group), then some number of digits followed by a comma (discarded) and then digits followed by a . and 2 decimal places (second capture group), followed finally by some number of digits and the end of string (discarded).

    To use in PHP

    $str = '12.345678,23.45678901';
    preg_match('/^(\d+\.\d{3})\d*,(\d+\.\d{2})\d*$/', $str, $matches);
    echo "first number: {$matches[1]}\nsecond number: {$matches[2]}\n";
    

    Output:

    first number: 12.345 
    second number: 23.45
    

    Demo on 3v4l.org

    If you need to get both matches in the $matches[0] array (using preg_match_all), you can use this regex:

    (?<=^)\d+\.\d{3}(?=\d*,)|(?<=,)\d+\.\d{2}(?=\d*$)
    

    This regex looks for either

    • the start of string followed by some digits, a . and 3 digits (followed by some number of digits and a comma); or
    • a comma, some number of digits, a . and 2 digits (followed by some number of digits and the end of string).

    To avoid capturing the unwanted data it is checked for using positive lookaheads and lookbehinds.

    Demo on 3v4l.org