Search code examples
phparrayscombinationscpu-wordmatch-phrase

Generate array of unique phrases from consecutive words grouped by word count


I'm setting up a word search using PHP's explode(), and counting how many spaces from query and how many words in the query.

For example, my user search Hello world, good morning (query from user, maybe more words)

and I get:

  • hello
  • world
  • good
  • morning

I want to show the unique sets of words as arrays like:

  1. ['hello world good morning']
  2. ['hello world good', 'world good morning']
  3. ['hello world', 'good morning', 'world good']
  4. ['hello', 'world', 'good', 'morning']

For no 1 and 4 I can solve it, but no 2 and 3 it's so hard to thing.

$oriSearch = 'Hello world, good morning';
$search_query = trim(strtolower($oriSearch));
$search_query = preg_replace_callback('#([\W_]+)#', function() {
return ' ';
}, $search_query);

$totalSpace = substr_count($search_query, ' ');
$totalWord = ceil($totalSpace+1);

if($totalSpace > 0)
{
    $wordPlode = explode(' ', $search_query);
    $wordQuery = array();
    for($i=1;$i<=$totalWord;$i++)
    {
        if($i == $totalWord)
        {
            $wordQuery[] = $search_query;
        }
        else if($i == 1) {
            $wordQuery[] = $wordPlode;
        }
        else
        {
            //Here I need
        }
    }
    echo var_dump($wordQuery);
}

Solution

  • Here you go:

    $clean_string = 'hello world good morning';
    
    $array = explode(' ',$clean_string);
    
    $len = count($array);
    
    for($i=1; $i<=$len; ++$i){
        $output[] = array_map(function($a) use($i,$array){
            if(count($a) != $i) $a =  array_slice($array, -$i);
            return implode(' ', $a);
        },array_chunk($array,$i));
    }
    print_r($output);
    

    Output

    Array
    (
        [0] => Array
            (
                [0] => hello
                [1] => world
                [2] => good
                [3] => morning
            )
    
        [1] => Array
            (
                [0] => hello world
                [1] => good morning
            )
    
        [2] => Array
            (
                [0] => hello world good
                [1] => world good morning
            )
    
        [3] => Array
            (
                [0] => hello world good morning
            )
    
    )
    

    Sandbox

    Obviously if you want it ordered the other way you can start at the count of the array and decrement it, instead (in the for loop).

    Like this:

    for($i=$len; $i>0; --$i) 
    $output[] = array_map(function($a) use($i,$array){
        if(count($a) != $i) $a =  array_slice($array, -$i);
        return implode(' ', $a);
    },array_chunk($array,$i));
    

    The output is just the reverse of the above one.

    Sandbox

    Chunky!

    it's pretty simple array chunk will take an array (from explode, single words) and make it muti-dimensional in the chunk size ($i) which is based of the length of the array.

    Then we can array map over that imploding the sub arrays if they are the length of $i, if they are not (odd vs even deal) which only happens at the end of the array, we can use array slice with a negative $i (the length we want) to fill that in. Negative starts at the end of the array.

    For example when we chunk array #2 in the above output we get this in the for loop:

        # index 2 from above output
        #explode
        array("hello", "world", "good", "morning")
    
        #array_chunk
        Array
        (
            [0] => array("hello", "world", "good")
            [1] => array("morning")
        )
    
       #array_slice
        Array
        (
            [0] => array("hello", "world", "good")
            [1] => array("world", "good", "morning")
        )
    
       #implode
       Array
        (
            [0] => "hello world good"
            [1] => "world good morning"
        )
    

    Which is only 1 not 3 (the value of $i) so we take the original array and slice it from the end for 3 items. ['world', 'good', 'morning'] and use that instead.

    Then when we implode both of those, we get what we want.

    PS. I didn't bother cleaning the string as you already have that worked out to a serviceable degree.