Search code examples
phpregexstringsplitarray-combine

php: better way to split string into associative array


I have a string like this:

"ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999"

and my goal is to split into an associative array:

Array
(
    [ALARM_ID/I4] => 1010001
    [ALARM_STATE/U4] => eventcode
    [ALARM_TEXT/A] => WMR_MAP_EXPORT
    [LOTS/A[1]] => [ STEFANO ]
    [ALARM_STATE/U1] => 1
    [WAFER/U4] => 1
    [VI_KLARF_MAP/A] => /test/klarf.map
    [KLARF_STEPID/A] => StepID
    [KLARF_DEVICEID/A] => DeviceID
    [KLARF_EQUIPMENTID/A] => EquipmentID
    [KLARF_SETUP_ID/A] => SetupID
    [RULE_ID/U4] => 1234
    [RULE_FORMULA_EXPRESSION/A] => a < b && c > d
    [RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
    [RULE_FORMULA_RESULT/A] => FAIL
    [TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)

The unique (but maybe dirties) way that I found is through this script:

<?php
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
$split = explode("=", $msg);
foreach($split as $k => $s) {
    $s = explode(" ", $s);
    $keys[] = array_pop($s);
    if ($s) $values[] = implode(" ", $s);
}
/*
 * this is needed if last parameter TIMESTAMP does not have ' ' (spaces) into value
 */
if (count($values) + 2 == count($keys)) array_push($values, array_pop($keys));
else                                    $values[ count($values) - 1 ] .= " " . array_pop($keys);
$params = array_combine($keys, $values);
print_r($params);
?>

Do you see a better way to split it maybe using regular expression or a different (elegant?) approach?


Solution

  • You could leverage the the presence of a / in all the keys

    ([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)
    

    Explanation

    • ( Capture group 1
      • [^\s=/]+ Match 0+ times any char except a whitespace = or /
      • /[^\s=]+ Then match / followed by the rest of the key
    • ) Close group 1
    • = Match literally
    • (.*?) Capture group 2, match any char except a newline as least as possible
    • (?=\h+[^\s=/]+/|$) Assert a key like format containing a / (as used in group 1)

    See a Regex demo and a Php demo.

    Example code

    $re = '`([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)`';
    $str = 'ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999
    ';
    
    preg_match_all($re, $str, $matches);
    $result = array_combine($matches[1], $matches[2]);
    
    print_r($result);
    

    Output

    Array
    (
        [ALARM_ID/I4] => 1010001
        [ALARM_STATE/U4] => eventcode
        [ALARM_TEXT/A] => WMR_MAP_EXPORT
        [LOTS/A[1]] => [ STEFANO ]
        [ALARM_STATE/U1] => 1
        [WAFER/U4] => 1
        [VI_KLARF_MAP/A] => /test/klarf.map
        [KLARF_STEPID/A] => StepID
        [KLARF_DEVICEID/A] => DeviceID
        [KLARF_EQUIPMENTID/A] => EquipmentID
        [KLARF_SETUP_ID/A] => SetupID
        [RULE_ID/U4] => 1234
        [RULE_FORMULA_EXPRESSION/A] => a < b && c > d
        [RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
        [RULE_FORMULA_RESULT/A] => FAIL
        [TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
    )
    

    If the keys should all start with word characters separated by an underscore, you can start the pattern using a repeating part [^\W_]+(?:_[^\W_]+)*

    It will match word chars except an _, and then repeat matching _ followed by word chars except _ until it reaches a /

    ([^\W_]+(?:_[^\W_]+)*/[^\s=]*)=(.*?)(?=\h+[^\s=/]+/|$)
    

    Regex demo