Search code examples
apiapi-designlanguage-designlibrary-design

Should String and CharArray be the same thing?


When designing a programming language API what are the advantages and disadvantages of:

  • The type String and Array(or linkedList) of chars are indistinguishable.

Like in: Haskell, Erlang, C

  • String is it's own type and Array(or linkedList) of chars type is different.

Like in: Java, C#, Lisp, JavaScript, ...


Solution

  • Reasons to have them be the same type:

    • simplicity. Fewer types to learn
    • orthogonality. All generic array code works on strings

    Reasons to have them be different:

    • can enforce invariants through type. If your strings are stored in an encoding where not every possible bit pattern is valid, like UTF8, then if String is not its own type, then it will be possible to have invalid strings.

    • can eliminate multiple copies of the same string. There is a technique called “interning” where only one copy of each distinct value of a type used in the program, is in memory at once. This is commonly done automatically for strings by the language, including by the languages with separate string types you mentioned, (at least for some implementations of Lisp). Doing so for strings has the nice effect of making string comparisons, including string keys in hash tables more efficient. You can do the comparison on the known-unique pointer instead of the string value.