Search code examples
phparraysparsingpreg-matchassociative-array

Multiline to associative array with PHP


I have a block of text in a variable $s:

FOO:317263
BAR:abcd
BAZ:s fsiu sfd sdf s dfsdddddd

What's the easiest way (and with good parsing performance) to get an associative array:

$a = array('FOO' => '317263', 'BAR' => 'abcd', 'BAZ' => 's fsiu sfd sdf s dfsdddddd');

?

Up to now, I used:

 preg_match("/^FOO:(.*)$/m", $s, $matches1);
 $a['FOO'] = $matches1[1];
 preg_match("/^BAR:(.*)$/m", $s, $matches1);
 $a['BAR'] = $matches1[1];
 ...

but is there a simpler way to do it?


Solution

  • You really can use a non-regex approach like

    $s = 'FOO:317263
    BAR:abcd
    BAZ:s fsiu sfd sdf s dfsdddddd';
    
    $a = array();
    foreach (explode("\n", $s) as $line) {
        $chnk = explode(':', $line, 2);
        $a[$chnk[0]] = $chnk[1];
    }
    print_r($a);
    

    After splitting with LF, explode(':', $line, 2); is used to split the line with the first occurrence of a colon.

    If you can have different/mixed line endings, replace explode("\n", $s) with preg_split('~\R+~', $s) or even preg_split('~\R+~u', $s) if you deal with Unicode.

    If you know you need to do some more matching than you revealed in the question, and you really need a regex, you may consider

    $a = array();
    if (preg_match_all('~^(\w+)\h*:\h*(.+)~m', $s, $matches)) {
        $a = array_combine($matches[1],trim($matches[2]));
    }
    print_r($a);
    

    See the PHP demo and the regex demo. Details:

    • ^ - start of a line (due to m flag)
    • (\w+) - Group 1: one or more word chars
    • \h*:\h* - a colon enclosed with zero or more horizontal whitespaces
    • (.+) - Group 2: the rest of the line.

    You can also use parse_ini_string:

    $a = parse_ini_string(preg_replace('/^([^:\v]*):(.*)/m', '$1=\'$2\'', $s), FALSE);
    

    See this PHP demo.

    The preg_replace('/^([^:\v]*):(.*)/m', '$1=\'$2\'', $s) part replaces all first : chars on each line with a =, wraps the parts after the first : with single quotes (to let parse_ini_string correctly handle additional =s), and parse_ini_string gets the array of keys and values.