Search code examples
c#.netencodingcharacter-encoding.net-core-3.1

1 byte 8 bit encoding


I need to create System.String from file with some unknown ASCII-compatible 1-byte encoding to replace some numbers in text with regex, but Encoding.ASCII is 7-bit, and Utf-8 is multi-byte so it won't round-trip back to same byte sequence.

Is there encoding in .Net Core which can round-trip any byte sequence?

UPD: Windows-1256 Character set looks promising, but it Windows only.


Solution

  • Using ISO-8859-1 will map directly to Latin-1 Supplement Unicode block and back again (roundtrip). And it is one of encodings .NET Core supports by default.

    // C#
    var enc = Encoding.GetEncoding(28591); // ISO-8859-1 (code page 28591)
    var b = Enumerable.Range(0, 0xFF + 1).Select(x => (byte)x).ToArray();
        
    enc.GetBytes(enc.GetString(b)).SequenceEqual(b) == true
    

    More over each char will have equivalent byte value

    // F#
    let bytes = [| Byte.MinValue .. Byte.MaxValue |]
    let chars = Encoding.Latin1.GetChars bytes
    Array.map byte chars = bytes
    
    val it: bool = true