Search code examples
phpmultidimensional-arraypreg-match

pushing to a multi-dimensional array in foreach using preg-match


I am having some difficulties constructing a multi dimensional array using preg_match.

I am trying to break a paragraph down into sentences. then for each section/ sentences of the paragraph, I'd like to break down every words and punctuation points into another level of the array.

@Toto yesterday helped me with preg-match to explode the string whilst retaining the punctuation points as elements.

However, I have been struggling to then construct the array I want.

Consider a paragraph like this:

First section. This section, and this. How about this section? And a section; split in two.

Desired Output

And in return for the results to look like this:

Array ( [0] => 
     Array ( [0] => First [1] => section [2] => . )
Array ( [1] =>
     Array ( [0] => This [1] => section [2] => , [3] => and [4] => this [2] => . ) 
Array ( [2] => 
     Array ( [0] => How [1] => about [2] => this [3] => section [4] => ? ) 
Array ( [3] =>
     Array ( [0] => And [1] => a [2] => section [3] => ; [4] => split 
     [5] => in [6] => two [7] => . )
)))

My code so far/ what I have tried

It does not work. I am not quite sure how I would go about deleting the content of $s once I have constructed the second dimension but right now I am more puzzled by the array duplicating every sections and adding them to Array [0]??

$m = '    First section. This section, and this. How about this section? And a section; split in two.'

$s = preg_split('/\s*[!?.]\s*/u', $m, -1, PREG_SPLIT_NO_EMPTY);

foreach ($s as $x => $var) {
    preg_match_all('/(\w+|[.;?!,:]+)/', $var, $a);
    array_push($s, $a);
}

print_r($s);

Solution

  • You were almost near, I just added PREG_SPLIT_DELIM_CAPTURE and changed the regex for preg_split. So you can use in this way:

    $str = 'First section. This section, and this. How about this section? And a section; split in two.';
    
    $matchDelim = preg_split("/([^.?!]+[.?!]+)/", $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
    
    $finalArr = [];
    
    foreach ($matchDelim as $match) {
        preg_match_all('/(\w+|[.;?!,:])/', $match, $matches);   
        $finalArr[] = $matches[0];
    }
    
    print_r($finalArr);
    

    Result:

    Array
    (
        [0] => Array
            (
                [0] => First
                [1] => section
                [2] => .
            )
    
        [1] => Array
            (
                [0] => This
                [1] => section
                [2] => ,
                [3] => and
                [4] => this
                [5] => .
            )
    
        [2] => Array
            (
                [0] => How
                [1] => about
                [2] => this
                [3] => section
                [4] => ?
            )
    
        [3] => Array
            (
                [0] => And
                [1] => a
                [2] => section
                [3] => ;
                [4] => split
                [5] => in
                [6] => two
                [7] => .
            )
    
    )