Search code examples
phpmysql-real-escape-stringhtml-entitieshtmlspecialchars

Sanitizing PHP/SQL $_POST, $_GET, etc...?


Ok, this subject is a hotbed I understand that. I also understand that this situation is dependent on what you are using as code. I have three situations that need to be resolved.

  1. I have a form in where we need to allow people to make comments and statements that use commas, tildes, etc... but still remain safe from attacks.

  2. I have people entering in dates like this: 10/13/11 mm/dd/yy in English, can this be sanitized?

  3. How do I understand how to use htmlspecialchars(), htmlentities() and real_escape_string() correctly? I've read the php.net site and some posts here but this seems to me to be a situation in where it all depends on the person reading the question what the right answer is.

I really can't accept that... there has to be an answer wherein text formats similar to that which I am posting here can be sanitized. I'd like to know if and how it is possible.

Thanks... because it seems to me that when asking this question in other places it tends to annoy... I am learning what I need to know but I think I have hit a plateau in what I can know without an example of what it is meant to do...

Thanks in advance.


Solution

  • It's a very important question and it actually has a simple answer in the form of encodings. The problem you are facing it that you use a lot of languages at the same time. First you are in HTML, then in PHP and a few seconds later in SQL. All these languages have their own syntax rules.

    The thing to remember is: a string should at all times be in its proper encoding.

    Lets take an example. You have a HTML form and the user enters the following string into it:

    I really <3 dogs & cats ;')

    Upon pressing the submit button, this string is being send to your PHP script. Lets assume this is done through GET. It gets appended to the URL, which has its own syntax (the & character has special meaning for instance) so we are changing languages. This means the string must be transformed into the proper URL-encoding. In this case the browser does it, but PHP also has an urlencode function for that.

    In the PHP script, the string is stored in $_GET, encoded as a PHP string. As long as you are coding PHP, this is perfectly fine. But now lets put the string to use in a SQL query. We change languages and syntax rules, therefore the string must be encoded as SQL through the mysql_real_escape_string function.

    At the other end, we might want to display the string back to the users again. We retrieve the string from the database and it is returned to us as a PHP string. When we want to embed it in HTML for output, we're changing languages again so we must encode our string to HTML through the htmlspecialchars function.

    Throughout the way, the string has always been in the proper encoding, which means any character the user can come up with will be dealt with accordingly. Everything should be running smooth and safe.

    A thing to avoid (sometimes this is even recommended by the ignorant) is prematurely encoding your string. For instance, you could apply htmlspecialchars to the string before putting it in the database. This way, when you retrieve the string later from the database you can stick it in the HTML no problem. Sound great? Yeah, really great until you start getting support tickets of people wondering why their PDF receipts are full of &amp; &gt; junk.

    In code:

    form.html:

    <form action="post.php" method="get">
        <textarea name="comment">
            I really <3 dogs &amp; cats ;')
        </textarea>
        <input type="submit"/>
    </form>
    

    URL it generates:

    http://www.example.org/form.php?comment=I%20really%20%3C3%20dogs%20&amp;%20cats%20;')
    

    post.php:

    // Connect to database, etc....
    
    // Place the new comment in the database
    $comment = $_GET['comment']; // Comment is encoded as PHP string
    
    // Using $comment in a SQL query, need to encode the string to SQL first!
    $query = "INSERT INTO posts SET comment='". mysql_real_escape_string($comment) ."'";
    mysql_query($query);
    
    // Get list of comments from the database
    $query = "SELECT comment FROM posts";
    
    print '<html><body><h2>Posts</h2>';
    print '<table>';
    
    while($post = mysql_fetch_assoc($query)) {
        // Going from PHP string to HTML, need to encode!
        print '<tr><td>'. htmlspecialchars($post['comment']) .'</td></tr>';
    }
    
    print '</table>';
    print '</body></html>'