Search code examples
regexunrealscript

Regex to capture key-value comma-separated values


I'm trying to write a regular expression to parse the values out of Unrealscript serialized objects. Part of that involves lines like this:

(X=32.69,Y='123.321',Z="A string with commas, just to complicate things!",W=Class'Some.Class')

The resultant capture should be:

[
    {
        'X':32.69,
        'Y':'A string with commas, just to complicate things!',
        'Z':'Class\'Some.Class\'
    }
]

What I want is to be able to distinguish between the key (eg. X) and the value (eg. Class\'Some.Class\').

Here is a pattern I've tried so far, just to capture a simple set of values (currently doesn't try to handle commas inside values, for now):

Pattern

\(((\S?)=(.+),?)+\)

Data set

(X=32,Y=3253,Z=12.21)

Result

https://regex101.com/r/gT9uU3/1

I'm still a novice with these regular expressions and any help would be appreciated!

Thanks in advance.


Solution

  • You can try this regex to associate key and value pairs:

    (?!^\()([^=,]+)=([^\0]+?)(?=,[^,]+=|\)$)
    

    Regex live here.

    Explaining:

    (?!^\()         # do not match the initial '(' character
    
    ([^=,]+)        # to match the key .. we take all from the last comma
    =               # till the next '=' character
    
    ([^\0]+?)       # any combination '[^\0]' - it will be the key's value
                      # at least one digit '+'
                      # but stops in the first occurrence '?'
    
    (?=             # What occurrence?
    
        ,[^,]+=     # a comma ',' and a key '[^,]+='
                      # important: without the key:
                      # the occurrence will stop in the first comma
                      # that should or should not be the delimiter-comma 
    
        |\)$        # OR '|':  the value can also be the last one
                      # which has not another key in sequence,
                      # so, we must accept the value
                      # which ends '$' in ')' character
    
    )               # it is all
    

    Hope it helps.

    Sorry my English, feel free to edit my explanation, or let me know in the comments. =)