Search code examples
c#utf-8utf-16

Does c# use UTF-8 or UTF-16 for strings?


To be more precise, the latest version of c# (c# 12 (.NET 8.0)), does it use UTF-8 or UTF-16 for strings?

I am confused because: https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding-introduction

A string is logically a sequence of 16-bit values, each of which is an instance of the char struct.

And here: https://learn.microsoft.com/en-us/dotnet/core/compatibility/globalization/5.0/icu-globalization-api

.NET 5 and later versions use International Components for Unicode (ICU) libraries for globalization functionality when running on Windows 10 May 2019 Update or later.

And what if run on Linux? Do I have to provide the ICU lib? Or is the statement, c# still uses 16-bit values and deletes the zeros for all latin languages and maps this than to the ICU?


Solution

  • In C#, strings are stored internally as UTF-16 encoded. This means that each character in a string occupies 16 bits of memory. String always contains Unicode (or more precisely, UTF-16).