Search code examples
c#charasciinon-printing-characters

Is there a way to get a symbol of a non-printable character?


I want to find a way to get a symbol of a non-printable character in c# (e.g. "SOH" for start of heading and "BS" for backspace). Any ideas?

Edit: I don't need to visualize a byte value of a non-printable character but it's code as shown here https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html

Example would be "NUL" for 0x00, "SOH" for 0x01 etc.


Solution

  • You, probably, are looking for a kind of string dump in order to visualize control characters. You can do it with a help of Regular Expressions where \p{Cc} matches control symbol:

    using Systen.Text.RegularExpressions;
    
    ...
    
    string source = "BEL \u0007 then CR + LF  \r\n SOH \u0001 \0\0";
    
    // To get control characters visible, we match them and
    // replace with their codes
    string result = Regex.Replace(
      source, @"\p{Cc}", 
      m => $"[Control: 0x{(int)m.Value[0]:x4}]");
    
    // Let's have a look:
    
    // Initial string 
    Console.WriteLine(source);
    Console.WriteLine();
    // Control symbols visualized
    Console.WriteLine(result);
    

    Outcome:

    BEL   then CR + LF  
     SOH  
    
    BEL [Control: 0x0007] then CR + LF  [Control: 0x000d][Control: 0x000a] SOH [Control: 0x0001] [Control: 0x0000][Control: 0x0000]
    

    Edit: If you want to visualize in a different way, you shoud edit lambda

    m => $"[Control: 0x{(int)m.Value[0]:x4}]"
    

    For instance:

        static string[] knownCodes = new string[] {
          "NULL", "SOH", "STX", "ETX", "EOT", "ENQ",
          "ACK",  "BEL", "BS", "HT", "LF", "VT",
          "FF", "CR", "SO", "SI", "DLE", "DC1", "DC2",
          "DC3", "DC4", "NAK", "SYN", "ETB", "CAN",
          "EM", "SUB", "ESC", "FS", "GS", "RS", "US",
        };
    
        private static string StringDump(string source) {
          if (null == source)
            return source;
    
          return Regex.Replace(
            source, 
           @"\p{Cc}", 
            m => {
              int code = (int)(m.Value[0]);
    
              return code < knownCodes.Length 
                ? $"[{knownCodes[code]}]" 
                : $"[Control 0x{code:x4}]";  
            });
        }
    

    Demo:

    Console.WriteLine(StringDump(source));
    

    Outcome:

    BEL [BEL] then CR + LF  [CR][LF] SOH [SOH] [NULL][NULL]