I have a website which runs on PHP and a MySQL database. I was wondering how to best treat user input in regard to HTML encoding (I am well aware that I should store as received and decode in output: that's what I do) and this cycle in particular:
<input value="però">
and when the user submits it the server will receive però
instead of però
.Now my question is: should the server decode all the received inputs so that però
gets decoded to the original però
?
My doubt is that this would mean that if an user inputs è
as his username it will be registered as è
and not as he actually intended...
I know this is not such a big problem (don't know of many users which would want to use HTML special characters encoding literals in their usernames...), but it puzzled me and I could not find a completely satisfying solution.
Unless I've misunderstood what you're asking, you seem to have the wrong impression about the effect of outputting HTML encoded strings into text inputs. Here's a basic example of what will happen. Let's say you have a user who wants to be named PB&J
. Sure, it's weird, but not everyone can pick a nice non-weird username like "Bonvi" or "Don't Panic".
So you save that in your database as is.
Later, when you're using it in another form, you escape it for output.
<input type="text" name="username" value="<?= htmlspecialchars($username) ?>">
In your page source, you'll see
<input type="text" name="username" value="PB&amp;J">
with the ampersand converted to an HTML entity. (Which is what you want, in case they really wanted to be named bob"><script>alert("però!")</script><p class="ha
or something worse.)
But the value displayed in the text box will be PB&J
, and when the user submits the form, the value in $_POST['username']
will be PB&J
, not PB&amp;J
. It will not be changed to the encoded value.
(I used htmlspecialchars
in this example, but the same would apply with your example using però with htmlentities
.)
I'm trying to explain it basically, so I apologize if I did misunderstand you - I don't intend to sound condescending.