Search code examples
phpurlencodesanitization

URL encode and filter sanitize output problems


I am trying to figure out why a sanitized string will be outputted differently than a non sanitized string when being URL encoded.

I don't know what this is called, but I've searched for URL encode and sanitization and tried google but I can't find any explanation.

I discovered this by accident after publishing a video, the problem is that I insert titles in the database, fetch them out and create a URL with it.

Sample URL (which does not work due to the problem)

localhost/proviin/video/kojima%26%2339%3Bs+cancelled+masterpiece+-+investigating+silent+hills/16

I made a single page test, to test what was going on and the behavior as you can see below.

How I need the outcome to be (but this is not sanitized):

$title = "Kojima's Cancelled Masterpiece - Investigating Silent Hills";
echo $title;
echo "<br>";
echo urlencode($title);

Outputs: (Which would work in the URL)

  • Kojima's Cancelled Masterpiece - Investigating Silent Hills
  • Kojima%27s+Cancelled+Masterpiece+-+Investigating+Silent+Hills

How it is

$title = sanitize("Kojima's Cancelled Masterpiece - Investigating Silent Hills", "str");
echo $title;
echo "<br>";
echo urlencode($title);

Outputs: (Which does not work in the URL, but is sanitized)

  • Kojima's Cancelled Masterpiece - Investigating Silent Hills

  • Kojima%26%2339%3Bs+Cancelled+Masterpiece+-+Investigating+Silent+Hills

Sanitize function

function sanitize($item, $type) {
    switch ($type) {
        case "str":
            return filter_var($item, FILTER_SANITIZE_STRING);
            break;
        case "mail":
            return filter_var($item, FILTER_SANITIZE_EMAIL);
            break;
        case "url":
            return filter_var($item, FILTER_SANITIZE_URL);
            break;
        case "int":
            return filter_var($item, FILTER_SANITIZE_NUMBER_INT);
            break;
        case "float":
            return filter_var($item, FILTER_SANITIZE_NUMBER_FLOAT);
            break;
        default:
            return false;
    }
}

As far as I know:

You sanitize data before inserting into the database.

You escape (htmlspecialchars) when you echo

But why is sanitized strings outputting differently when using urlencode() ?

If this is the normal behavior, how on earth do I sanitize strings before inserting them into a database table and use them in a URL with urlencode() ?


Solution

  • You are double-escaping your strings. You should not pass the return value of your sanitize function to urlencode(). Both escape the data, but in different ways, so they cannot be chained like you're doing here (not that any escape function should be run twice anyway).

    So no, you don't need to sanitize your data like this before you insert it into the database. You need to escape it using prepared statements so it comes back in the same way when returned from the database, ready for urlencode() or htmlentities() to work their magic. Unless you need the data stored in a specific way, in which case a preg_replace is probably better.

    Also, be aware that user input should also not be unserialized() for the exact same reason: http://php.net/manual/en/function.unserialize.php