Search code examples
phpcakephpcharacter-encodingcharactercount

Trying to count characters after submitting a comment, mb_strlen gives back weird results


In my controller, I access the comment data with $this->request->data['Comment']['text']. I use CakePHP's formhelper to build the form, and a plugin called Summernote to transform the textarea into a WYSIWYG editor. I save the comment as HTML in my database.

In this case, I am trying to submit a comment with just '>'

$data = $this->request->data['Comment']['text'];

pr($data);
//returns >

pr(mb_strlen($data, utf-8));
//returns 4

pr(mb_strlen('>', utf-8));
//returns 1
//that is the one that confuses me the most, 
//it seems that there's a difference between $data and '>'

mb_detect_encoding($data);
//returns ASCII

I'm already using jQuery to check the number of characters entered on the front-end, so I can deactivate the submit-button when the user goes over the limit. This uses .innerText.length and works like a charm, but if I make that the only check people can just go into the element editor and re-enable the submit button to send however long comments they like.

EDIT: var_dump($this->request->data['Comment']['text']) gave me the following result:

Note that unlike in the examples above, I am trying to send '>>>' here

array (size=1)
  'text' => string '>>>' (length=12)

EDIT: Alex_Tartan figured out the problem: I needed to do html_entity_decode() on my string before counting it with mb_strlen()!


Solution

  • I've tested the case here: https://3v4l.org/VLr9e

    What might be the case is an untrimmed $data (white spaces won't show up in regular print - you can use var_dump($data)).

    The textarea tag will enclose formatting spaces into the value.
    Check out Why is textarea filled with mysterious white spaces?

    so for that, you can do:

    $data = '>   ';
    $data = trim($data);
    // var_dump(data) will output: 
    // string(4) ">   "
    
    echo $data."\n";
    //returns >
    
    echo mb_strlen($data, 'UTF-8')."\n";
    //returns 1
    
    echo mb_strlen('>', 'UTF-8')."\n";
    //returns 1
    

    Update (from comments):

    The problem was encoded html characters which needed to be decoded:

    $data = html_entity_decode($data);