Search code examples
phpstringtruncatemultibyte

Truncate a multibyte String to n chars


I am trying to get this method in a String Filter working:

public function truncate($string, $chars = 50, $terminator = ' …');

I'd expect this

$in  = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890";
$out = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …";

and also this

$in  = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ";
$out = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …";

That is $chars minus the chars of the $terminator string.

In addition, the filter is supposed to cut at the first word boundary below the $chars limit, e.g.

$in  = "Answer to the Ultimate Question of Life, the Universe, and Everything.";
$out = "Answer to the Ultimate Question of Life, the …";

I am pretty certain this should work with these steps

  • substract amount of chars in terminator from maximum chars
  • validate that string is longer than the calculated limit or return it unaltered
  • find the last space character in string below calculated limit to get word boundary
  • cut string at last space or calculated limit if no last space is found
  • append terminator to string
  • return string

However, I have tried various combinations of str* and mb_* functions now, but all yielded wrong results. This can't be so difficult, so I am obviously missing something. Would someone share a working implementation for this or point me to a resource where I can finally understand how to do it.

Thanks

P.S. Yes, I have checked https://stackoverflow.com/search?q=truncate+string+php before :)


Solution

  • Try this:

    function truncate($string, $chars = 50, $terminator = ' …') {
        $cutPos = $chars - mb_strlen($terminator);
        $boundaryPos = mb_strrpos(mb_substr($string, 0, mb_strpos($string, ' ', $cutPos)), ' ');
        return mb_substr($string, 0, $boundaryPos === false ? $cutPos : $boundaryPos) . $terminator;
    }
    

    But you need to make sure that your internal encoding is properly set.