Search code examples
cstringcharacter-encodinglanguage-lawyerstring-literals

What characters are legal to use in string literals?


I am wondering if it is legal in C to literally put ascii characters like TAB, BEL and ESC directly in a string literal.

There is no way to display the characters in plain text here on Stackoverflow so I had to take a screenshot instead.

example

Characters that does not have a graphical representation are display using Caret notation and highlighted in purple in the screenshot. There is also a TAB-character at line 7 that indents the text.

This compiles without any warnings using gcc -std=c99 -pedantic, but is it really fully portable?

This is not something that I would use for any serious programs. I am just curious if it the standards allow it.


Solution

  • The portable characters that can apoear in the program source are exactly these:

    • the 26 uppercase letters of the Latin alphabet

      A  B  C  D  E  F  G  H  I  J  K  L  M
      N  O  P  Q  R  S  T  U  V  W  X  Y  Z
      
    • the 26 lowercase letters of the Latin alphabet

      a  b  c  d  e  f  g  h  i  j  k  l  m
      n  o  p  q  r  s  t  u  v  w  x  y  z
      

      the 10 decimal digits

      0  1  2  3  4  5  6  7  8  9
      
    • the following 29 graphic characters

      !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :
      ;  <  =  >  ?  [  \  ]  ^  _  {  |  }  ~
      
    • the space character, and control characters representing horizontal tab, vertical tab, and form feed.

    Source: the C standard, any version.

    An implementation must accept these characters, and is allowed to accept any additional characters.