Search code examples
windowswinapiunicodecharacter-encodingcodepages

How to convert encodingName to codePage identifier?


Given the name of an encoding, how can i get the corresponding codePage identifier?

For example:

  • "iso-8859-1": 28591
  • "windows-1252": 1252
  • "IBM500": 500
  • "utf-16le": 1200
  • "utf-8": 65001

Hypothetical use case: the Windows function MultiByteToWideChar only takes a CodePage, and i only have an encodingName.

And EnumSystemCodePages returns a list of strings, not code page identifiers (so you can't pass them to GetCPInfo).

Bonus Reading


Solution

  • There is no Win32 API for what you ask.

    If you can use .NET, you can create an object instance of the System.Text.Encoding class from an encoding name using the Encoding.GetEncoding(String) method, and then you can read its CodePage property.

    Otherwise, you can look in the following Registry keys:

    HKEY_CLASSES_ROOT\MIME\Database\Codepage
    HKEY_CLASSES_ROOT\Mime\Database\Charset
    

    But, do note that there are some inaccuracies in the Registry. For example, iso-8859-1 gets mapped to codepage 1252 instead of the more preferred 28591.

    Otherwise, you will just have to create your own lookup table in your code as needed.