Search code examples
phpmergesubstringstring-concatenation

How to concatenate strings without partial duplication in PHP?


I have a series of strings in PHP array.

Each string sometimes overlaps with the previous one (by one or more words) and sometimes doesn't overlap:

$My_Array = [

  'The quick',
  'quick brown',
  'quick brown fox',
  'jumps over the',
  'over the',
  'lazy dog',
];

I'd like to merge only those strings which have overlaps.

ie. where the characters at the start of one string already exist at the end of the preceding string.

My aim is to return the following array:

$My_Processed_Array = [

  'The quick brown fox',
  'jumps over the',
  'lazy dog',
];

Work Completed so far:

I have put this together, which works in this instance, but I'm sceptical that it will cover all cases:

function process_my_array($array) {
    
  for ($i = (count($array) - 1); $i > 0; $i--) {
  
    // TURN STRING ELEMENTS INTO MINI-ARRAYS
    $Current_Element = explode(' ', trim($array[$i]));
    $Previous_Element = explode(' ', trim($array[($i - 1)]));
    
    $End_Loop = FALSE;
    
    // STRING-MATCHING ROUTINE
    while ($End_Loop === FALSE) {

      if ($Current_Element[0] === $Previous_Element[(count($Previous_Element) - 1)]) {            
        array_shift($Current_Element);
        $array[$i] = implode(' ', $Current_Element);
        $array[($i - 1)] .= ' '.$array[$i];
        unset($array[$i]);
        $array = array_values($array);
        
        $End_Loop = TRUE;
      }
        
      elseif (count($Current_Element) > 1) {
        $Current_Element[0] .= ' '.$Current_Element[1];
        unset($Current_Element[1]);
        $Current_Element = array_values($Current_Element);
      
        if (isset($Previous_Element[(count($Previous_Element) - 2)])) {
          $Previous_Element[(count($Previous_Element) - 2)] .= ' '.$Previous_Element[(count($Previous_Element) - 1)];
          unset($Previous_Element[(count($Previous_Element) - 1)]);
          $Previous_Element = array_values($Previous_Element);
        }
      }
      
      elseif (count($Current_Element) === 1) {
        $End_Loop = TRUE;
      }
    }
  }
    
  return $array;
}

More importantly, I'm almost certain there must be a much simpler way to achieve the target outcome than what I've put together above.


Solution

    • Split each string by space using explode().
    • Compare it with previous exploded string one by one.
    • Create a new pointer for comparison.
    • If the current pointer of current word doesn't match with current word in prev, reset pointer to 0. Else, keep incrementing current pointer.
    • This way, we got a hang of longest suffix in the previous string that is a prefix in the current string.
    • Slice out the exploded array from current pointer.
    • To stitch the residue of current string with the previous one, use array_merge and implode them back in the end.
    • If the current pointer happens to be 0 even after comparison, you can safely assume it is a completely new word.

    Snippet:

    <?php
    
    $My_Processed_Array = [];
    
    $prev = [];
    $curr = [];
    foreach($My_Array as $val){
        $val = explode(" ",$val);
        $ptr = 0;
        foreach($prev as $index => $prev_val){
            if($prev_val == $val[$ptr]){
                $ptr++;
            }else{
                $ptr = 0;
            }
            if($ptr == count($val)){
                if($index == count($prev) - 1) break;
                $ptr = 0;
            }
        }    
        $sliced_data = array_slice($val, $ptr);
        if($ptr == 0 && !empty($curr)){
            $My_Processed_Array[] = implode(" " ,$curr);
            $curr = [];
        }
        $curr = array_merge($curr,$sliced_data);
        $prev = $val;
    }
    
    if(!empty($curr)){
        $My_Processed_Array[] = implode(" " ,$curr);
    }