Search code examples
vimutf-8

How to display UTF-8 characters in Vim correctly


I want/need to edit files with UTF-8 characters in it and I want to use Vim for it. Before I get accused of asking something that was asked before, I've read the Vim documentation on encoding, fileencoding[s], termencoding and more, googled the subject, and read this question among other texts.

Here is a sentence with a UTF-8 character in it that I use as a test case.

From Japanese 勝 (katsu) meaning "victory"

If I open the (UTF-8) file with Notepad it is displayed correct. When I open it with Vim, the best thing I get is a black square where the Japanese character for katsu should be. Changing any of the settings for fileencoding or encoding does not make a difference.

Why is Vim giving me a black square where Notepad is displaying it without problems? If I copy the text from Vim with copy/paste to Notepad it is displayed correctly, indicating that the text is not corrupted but displayed wrong. But what setting(s) have influence on that?

Here is the relevant part of my _vimrc:

if has("multi_byte")
  set encoding=utf-8
  if &termencoding == ""
    let &termencoding = &encoding
  endif
  setglobal fileencoding=utf-8
  set fileencodings=ucs-bom,utf-8,latin1
endif

The actual settings when I open the file are:

encoding=utf-8
fileencoding=utf-8
termencoding=utf-8

My PC is running Windows 10, language is English (United States).

This is what the content of the file looks like after loading it in Vim and converting it to hex:

0000000: efbb bf46 726f 6d20 4a61 7061 6e65 7365  ...From Japanese
0000010: 20e5 8b9d 2028 6b61 7473 7529 206d 6561   ... (katsu) mea
0000020: 6e69 6e67 2022 7669 6374 6f72 7922 0d0a  ning "victory"..

The first to bytes is the Microsoft BOM magic, the rest is just like ASCII except for the second, third and fourth byte on the second line, which must represent the non-ASCII character somehow.


Solution

  • There are two steps to make Vim successfully display a UTF-8 character:

    1. File encoding. You've correctly identified that this is controlled by the 'encoding' and 'fileencodings' options. Once you've properly set this up (which you can verify via :setlocal filenencoding?, or the ga command on a known character, or at least by checking that each character is represented by a single cell, not its constituent byte values), there's:
    2. Character display. That is, you need to use a font that contains the UTF-8 glyphs. UTF-8 is large; most fonts don't contain all glyphs. In my experience, that's less of a problem on Linux, which seems to have some automatic fallbacks built in. But on Windows, you need to have a proper font installed and configured (gVim: in guifont).

    For example, to properly display Japanese Kanji characters, you need to install the far eastern language support in Windows, and then

    :set guifont=MS_Gothic:h12:cSHIFTJIS