Search code examples
phpparsinghierarchical-datamultilinetext-parsing

Parse a multi-line string with parent data and indented child data and create a multidimensional array


I am trying to parse a string in PHP:

 -father_name "John" -father_weight 44.50 
    -kid >name "Marko" >age 12
    -kid >name "Sevda" >age 17
    -kid >name "Nathan" >age 19

There are two main FORMS:

  1. Attributes (such as -father, -weight, -kid)
  2. Sub-Attributes (such as >name, >age)

Note: Attributes are NOT FIXED and NOT ALWAYS SEPERATED BY single space

And their VALUES have two types:

  1. String (like "Marko")
  2. Int or Decimal (like 12.00)

OUTPUT would be:

[
    'father_name' => 'John',
    'father_weight' => '44.50',
    'kid' => [
        ['name' => "Marko", 'age' => 12],
        ['name' => "Sevda", 'age' => 17],
        ['name' => "Nathan", 'age' => 19]
    ]
]

It should return FORMS (attrs and sub-attrs) and VALUES SEPARATELY.

How can I parse this line in PHP cleverly?

Last Note: Solution I found for this: YAML.


Solution

  • Try with this:

    function parse_attributes($string, $separators = array('-','>'), $level = 0){
        $attributes = explode($separators[$level], $string);
        $attrs = array();
        $ret_arr = array();
        foreach($attributes as $attribute){
            if(!empty($attribute)){
                $ex_attr = explode(' ',$attribute);
                if(!empty($ex_attr[1])){
                    if(count($separators) > $level && strpos($attribute, $separators[$level+1])){
                        $ret = parse_attributes($attribute, $separators, $level+1);
                        array_push($ret_arr, $ret);
                    }
                    if (empty($ret_arr))
                        $attrs[$ex_attr[0]] = str_replace('"', '', $ex_attr[1]);
                    else
                        $attrs[$ex_attr[0]] = $ret_arr;
                }
            }
        }
        return $attrs;
    }
    

    Using:

    $returned = parse_attributes('-father_name "John" -father_weight 44.50 -kid >name "Marko" >age 12 -kid >name "Sevda" >age 17 -kid >name "Nathan" >age 19');
    
    print_r($returned);
    

    Returns:

    Array
    (
        [father_name] => John
        [father_weight] => 44.50
        [kid] => Array
            (
                [0] => Array
                    (
                        [name] => Marko
                        [age] => 12
                    )
    
                [1] => Array
                    (
                        [name] => Sevda
                        [age] => 17
                    )
    
                [2] => Array
                    (
                        [name] => Nathan
                        [age] => 19
                    )
    
            )
    
    )
    

    And using:

    echo($returned['kid'][0]['name']);
    

    Returns:

    Marko

    NOTE: You can specify more separator array items, an item for each attribute level you have.

    Hope this helps.