Search code examples
phpcodeigniterpostcharacter-encodingreplace

CodeIgniter replacing selected characters in all POST input


I'm planning to convert an existing intranet system to CodeIgniter. I've always used UTF-8 throughout so it can handle all sorts of different characters, this is essential for the system (outputting invoices, address labels etc).

There's a few characters I decided to replace on input automatically as they often end up confusing the users of the system.

  • Curly quotes, both single and double. Replaced with normal apostrophe/quotation marks
  • En dash and em dash, replaced with a normal hyphen
  • Ellipses, replaced with three full stops

At least these punctuation symbols are now all used and stored consistently.

Data that is to be stored in a database is always received by POST in this system, so I run the following function over the POST array on every page load...

function nasty_chars_replace(&$var) {

    $trans_table = array(
        chr(0xe2).chr(0x80).chr(0x9a) => '\'', //SINGLE LOW-9 QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0x9e) => '"', //DOUBLE LOW-9 QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0xa6) => '...', //HORIZONTAL ELLIPSIS
        chr(0xe2).chr(0x80).chr(0x98) => '\'', //LEFT SINGLE QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0x99) => '\'', //RIGHT SINGLE QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0x9c) => '"', //LEFT DOUBLE QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0x9d) => '"', //RIGHT DOUBLE QUOTATION MARK
        chr(0xe2).chr(0x80).chr(0x93) => '-', //EN DASH
        chr(0xe2).chr(0x80).chr(0x94) => '-' //EM DASH
    );

    foreach ($trans_table as $utf8_code => $replace) {
        $var = str_replace($utf8_code, $replace, $var);
    }

    return trim($var);
}
array_walk_recursive($_POST, 'nasty_chars_replace');

Is there a method for doing something similar in CodeIgniter, globally on all POST data (if POST is present)?

Does anyone else do anything like this?
Are there any other characters that are easily confused that I should consider "cleansing" (not sanitizing as such) for data consistency?
EDIT: Is this even a good idea?

EDIT 2: I should say that I also trim() all POST data as well to remove leading/trailing whitespace. So validation can fail if someone decides to just fill an input with whitespace.


Solution

  • If you would like to run this on all POST data without having the need to call the function every time, consider either placing it in the constructor of you controller, or, in the constructor of the parent of your controller.

    In Constructor of Controller:

    class Home extends CI_Controller{
        function __construct(){
            parent::__construct();
            if(!empty($_POST)) array_walk_recursive($_POST, 'nasty_chars_replace');
        }
    
        function index(){
            //typically a GET request, nasty_chars_replace will not execute.
        }
    
        function post_here1(){
            //will be nasty char cleaned.
        }
    
        function post_here2(){
            //will be nasty char cleaned.
        }
    
    }
    

    Now as you can imagine, this has to be written in the constructor of all your controllers. If you want to write it only once, consider extending your controllers from a base class first and write it in the constructor of the parent:

    In Constructor of Parent Controller:

    class MY_Controller extends CI_Controller{
        function __construct(){
            parent::__construct();
            if(!empty($_POST)) array_walk_recursive($_POST, 'nasty_chars_replace');
        }
    }
    

    ...and back in home.php:

    class Home extends MY_Controller{
        function __construct(){
            parent::__construct();
        }
    
        function index(){
            //typically a GET request, nasty_chars_replace will not execute.
        }
    
        function post_here1(){
            //will be nasty char cleaned.
        }
    
        function post_here2(){
            //will be nasty char cleaned.
        }
    }
    

    I encourage you to have a look at Phil Sturgeon's Keeping It Dry post for more information on how to use this base class inheritance.