Search code examples
windowsutf-8character-encodingnotepad

UTF-8 characters missing or displayed as boxes in Notepad, but works fine in webbrowser and other text editors


I have UTF-8 text stored in DB and served as text/plain; charset=utf-8 in a web application. All the things are working fine. I can see the UTF-8 text on browser window without any problem.

But when I save that text to a file and try to open it in Windows Notepad, I got some characters missing and displayed as a small rectangular box. However, the text file looks fine in other editors like EditPlus and Notepad++.

How is this caused and how can I solve it?


Solution

  • If it looks fine in other editors, then the text itself is fine. If it looks OK in the browser, then the response is probably fine too (but better check page info in the browser and see what the encoding is). Your problem is probably with notepad itself. Sometimes it requires BOM to detect Unicode properly. But BOM can break other apps that don't support it. You should also try Notepad on different versions of Windows. I have just tried opening an UTF-8 file in Windows 7, looks fine to me.