Search code examples
htmlencodingutf-8latin1

Pro's and Con's of using HTML Codes vs Special Characters


When building websites for non-english speaking countries

you have tons of characters that are out of the scope.

For the database I usally encode it on either utf-8 or latin-1.

I would like to know if there is any issue with performance, speed resolution, space optimization, etc.

For the fixed texts that are on the html between using for example

á or á

which looks exactly the same: á or á

The things that I have so far for using it with utf-8:

Pros:

  • Easy to read for the developers and the web administrator
  • Only one space ocupied on the code instead of 4-5
  • Easier to extract an excerpt from a text
  • 1 byte against 8 bytes (according to my testings)

Cons:

  • When sending files to other developers depending on the ide, softwares, etc that they use to read the code they will break the accent in things like: é
  • When an auto minification of code occurs it sometimes break it too
  • Usually breaks when is inside an encoding

The two cons that I have a bigger weight than the pros by my perspective because the reflect on the visitor.


Solution

  • Just use the actual character á.

    This is for many reasons.

    First: a separation of concerns, the database shouldn't know about HTML. Just imagine if at a later date you want to create an API to use it in another service or a Mobile App.

    Second: just use UTF-8 for your database not latin. Again, think ahead what if your app suddently needs to support Japanese then how you store あ?

    You always have the change to convert it to HTML codes if you really have to... in a view. HTML is an implementation detail, not core to your app.

    If your concern is the user, all major browsers in this time and age support UTF-8. Just use the right meta tag. Easy.

    If your problem are developers and their tools take a look at http://editorconfig.org/ to enforce and automatize line endings and the usage of UTF-8 in your files. Maybe add some git attributes to the mix and why not go the extra mile and have a git precommit hook running some checker so make super sure everyone commits UTF-8 files.

    Computer time is cheap, developer time is expensive: á is easier to change and understand, just use it.