Search code examples
phparraysreplacefilteringmultibyte

Filter array of multibyte strings ending in a specific letter and conditionally mutate the retained strings


A now write this function for my script. It works well type but a little slows down. Consider the function and if you have options for optimally ask me to help.

Here is my code:

function izada($array) {
    foreach ($array as $key => $value) {
        if(substr_count($value, "ӣ") == 2) {
            $result[] = str_replace("ӣ ", "ӣ, ", $value);
        }
        if(mb_substr($value, -1) !== "ӣ") {
            unset($array[$key]);
        }
        if(substr_count($value, "ӣ") == 2) {
            unset($array[$key]);
        }
        $array = array_filter(array_unique(array_merge($array, $result)));
    }
    foreach ($array as $key => $value) {
        if(substr_count($value, "ӣ") > 2 || substr_count($value, "ӣ") < 1) {
            unset($array[$key]);
        }
    }
    return $array;
}

Input and function call:

$array = array (
  "забони тоҷикӣ",
  "хуҷандӣ бӯстонӣ",
  "Тоҷикистон Ватанам",
  "Ғафуровӣ Мичуринӣ Савхозӣ",
  "Конверторӣ хуруфҳо"
);

$array = izada($array);

echo"<pre>";
print_r($array);
echo"</pre>";

Result must be:

Array (
  [0] => забони тоҷикӣ
  [1] => хуҷандӣ, бӯстонӣ
)

Solution

  • Jakub's answer is not optimized and is potentially incorrect according to your posted method.

    • It allows the possibility of a value with 2 ӣ's but not ending with ӣ to qualify. (If this is acceptable, then you should clarify your question requirements.)

    • It calls substr_count() 1 to 3 times per iteration (depending on conditional outcomes). The important thing to consider for efficiency is minimizing function calls.

    This is a more accurate / efficient process:

    Method: (Demo)

    foreach ($array as $v) {
        if (mb_substr($v, -1) == "ӣ") {
            if (($count = substr_count($v, "ӣ")) == 1) {
                $result[] = $v;
            } elseif ($count == 2) {
                $result[] = str_replace("ӣ ", "ӣ, ", $v);
            }
        }
    }
    var_export($result);
    

    Output:

    array (
      0 => 'забони тоҷикӣ',
      1 => 'хуҷандӣ, бӯстонӣ',
    )
    

    Notice that my method first requires the final character to be ӣ, this offers the quickest return without declaring/overwriting $count for non-qualifying values. $count is used to cache the result of substr_count() for each iteration. By doing this, the iteration only needs to make the function call once -- improving efficiency.


    Update, if my earlier snippet is logically correct, it can be refactored to be a single preg_filter() call including two replacements (one which needs no comma injection and another which injects a comma) (Demo)

    var_export(
        preg_filter(
            ['/^[^ӣ]*(?:ӣ(?! ))?[^ӣ]*ӣ$/u', '/^[^ӣ]*ӣ\K [^ӣ]*ӣ$/u'],
            ['$0', ',$0'],
            $array
        )
    );