Search code examples
regexpreg-matchgs1-ai-syntaxgs1-128

Preg_match / split barcode


I am struggeling with reading a GS1-128 barcode, and trying to split it up into the segments it contains, so I can fill out a form automatically.

But I can't figure it out. Scanning my barcode gives me the following: ]d2010704626096200210KT0BT2204[GS]1726090021RNM5F8CTMMBHZSY7

So I tried starting with preg_match and made the following:

/]d2[01]{2}\d{14}[10|17|21]{2}(\w+)/

Which gives me this result:

Array ( [0] => ]d2010704626096200210KT0BT2204 [1] => KT0BT2204 )

Now [1] is actually correct, men [0] isnt, so I have run into a wall.

In the end, this is the result I would like (without 01,10,17,21):

(01) 07046260962002
(10) KT0BT2204
(17) 60900
(21) RNM5F8CTMMBHZSY7

01 - Always 14 chars after
17 - Always 6 chars after

10 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 10 <GS> is not present

21 can be up to 20 chars, but always has end delimiter <GS> - But if barcode ends with 21 <GS> is not present

I tried follwing this question: GS1-128 and RegEx But I couldnt figure it out.

Anyone that can help me?


Solution

  • This regex should do what you want (note I've split it into separate lines for clarity, you can use it like this with the x (extended) flag, or convert it back to one line):

    ^]d2(?:
    01(?P<g01>.{14})|
    10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
    17(?P<g17>.{6})|
    21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
    )+$
    

    It looks for

    • start-of-line ^ followed by a literal ]d2 then one or more of
    • 01 followed by 14 characters (captured in group g01)
    • 10 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g10)
    • 17 followed by 6 characters (captured in group g17)
    • 21 followed by up to 20 characters, terminated by either [GS] or end-of-line (captured in group g21)
    • finishing with end-of-line $

    Note that we need to use tempered greedy tokens to avoid the situation where a 10 or 21 code might swallow a following code (as in the second example in the regex demo below).

    Demo on regex101

    In PHP:

    $barcode = ']d201070462608682672140097289158930[GS]10101656[GS]17261130';
    
    preg_match_all('/^]d2(?:
    01(?P<g01>.{14})|
    10(?P<g10>(?:(?!\[GS]).){1,20})(?:\[GS]|$)|
    17(?P<g17>.{6})|
    21(?P<g21>(?:(?!\[GS]).){1,20})(?:\[GS]|$)
    )+$/x', $barcode, $matches);
    
    print_r($matches);
    

    Output:

    Array
    (
        [0] => Array
            (
                [0] => ]d201070462608682672140097289158930[GS]10101656[GS]17261130
            )
    
        [g01] => Array
            (
                [0] => 07046260868267
            )
    
        [1] => Array
            (
                [0] => 07046260868267
            )
    
        [g10] => Array
            (
                [0] => 101656
            )
    
        [2] => Array
            (
                [0] => 101656
            )
    
        [g17] => Array
            (
                [0] => 261130
            )
    
        [3] => Array
            (
                [0] => 261130
            )
    
        [g21] => Array
            (
                [0] => 40097289158930
            )
    
        [4] => Array
            (
                [0] => 40097289158930
            )
    
    )
    

    Demo on 3v4l.org