I'm planning to convert an existing intranet system to CodeIgniter. I've always used UTF-8 throughout so it can handle all sorts of different characters, this is essential for the system (outputting invoices, address labels etc).
There's a few characters I decided to replace on input automatically as they often end up confusing the users of the system.
At least these punctuation symbols are now all used and stored consistently.
Data that is to be stored in a database is always received by POST in this system, so I run the following function over the POST array on every page load...
function nasty_chars_replace(&$var) {
$trans_table = array(
chr(0xe2).chr(0x80).chr(0x9a) => '\'', //SINGLE LOW-9 QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9e) => '"', //DOUBLE LOW-9 QUOTATION MARK
chr(0xe2).chr(0x80).chr(0xa6) => '...', //HORIZONTAL ELLIPSIS
chr(0xe2).chr(0x80).chr(0x98) => '\'', //LEFT SINGLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x99) => '\'', //RIGHT SINGLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9c) => '"', //LEFT DOUBLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9d) => '"', //RIGHT DOUBLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x93) => '-', //EN DASH
chr(0xe2).chr(0x80).chr(0x94) => '-' //EM DASH
);
foreach ($trans_table as $utf8_code => $replace) {
$var = str_replace($utf8_code, $replace, $var);
}
return trim($var);
}
array_walk_recursive($_POST, 'nasty_chars_replace');
Is there a method for doing something similar in CodeIgniter, globally on all POST data (if POST is present)?
Does anyone else do anything like this?
Are there any other characters that are easily confused that I should consider "cleansing" (not sanitizing as such) for data consistency?
EDIT: Is this even a good idea?
EDIT 2: I should say that I also trim()
all POST data as well to remove leading/trailing whitespace. So validation can fail if someone decides to just fill an input with whitespace.
If you would like to run this on all POST data without having the need to call the function every time, consider either placing it in the constructor of you controller, or, in the constructor of the parent of your controller.
class Home extends CI_Controller{
function __construct(){
parent::__construct();
if(!empty($_POST)) array_walk_recursive($_POST, 'nasty_chars_replace');
}
function index(){
//typically a GET request, nasty_chars_replace will not execute.
}
function post_here1(){
//will be nasty char cleaned.
}
function post_here2(){
//will be nasty char cleaned.
}
}
Now as you can imagine, this has to be written in the constructor of all your controllers. If you want to write it only once, consider extending your controllers from a base class first and write it in the constructor of the parent:
class MY_Controller extends CI_Controller{
function __construct(){
parent::__construct();
if(!empty($_POST)) array_walk_recursive($_POST, 'nasty_chars_replace');
}
}
...and back in home.php:
class Home extends MY_Controller{
function __construct(){
parent::__construct();
}
function index(){
//typically a GET request, nasty_chars_replace will not execute.
}
function post_here1(){
//will be nasty char cleaned.
}
function post_here2(){
//will be nasty char cleaned.
}
}
I encourage you to have a look at Phil Sturgeon's Keeping It Dry post for more information on how to use this base class inheritance.