Search code examples
c#localecodepages

How get list of codepages from string


I have string with different codepages: string multi = "EnglishРусский日本語";

I need to return list of codepages:

int[] GetCodePage(string multi)
{
   return new int[] {1252, 1251, 932};
}

Solution

  • From your comments, it seems that your problem is different.

    If you only need to check if a filename (a string) uses only characters from the "default codepage" (the Windows api uses unicode plus a single non unicode codepage, that is the default codepage for non-unicode programs), then it is quite simple. Encoding.Default is the Windows non-unicode codepage.

    public static void Main()
    {
        Console.WriteLine(Encoding.Default.BodyName);
    
        // I live in Italy, we use the Windows-1252 as the default codepage 
        Console.WriteLine(CanBeEncoded(Encoding.Default, "Hello world àèéìòù"));
    
        Console.WriteLine(CanBeEncoded(Encoding.Default, "Русский"));
    }
    

    and the interesting code:

    public static bool CanBeEncoded(Encoding enc, string str)
    {
        // We want to modify the Encoding, so we have to clone it
        enc = (Encoding)enc.Clone();
        enc.EncoderFallback = new EncoderExceptionFallback();
    
        try
        {
            enc.GetByteCount(str);
        }
        catch (EncoderFallbackException)
        {
            return false;
        }
    
        return true;        
    }
    

    Note that this code could be optimized. Using an exception to check for the fact that the string can be encoded isn't optimal (but it is easy to write :-) ). A better solution would be to subclass the EncoderFallback class.