Search code examples
xmlunicodecsvseparator

Vertical bar (|) Unicode replacement


We are using the vertical bar | (|) character as a field separator in one of our modules. so the users should not use this character in a title.

If they do use it, I would like to replace it with a similar character.

Is there a Unicode replacement for it? The only character that I have found and looks similar to it is broken vertical bar ¦ (¦).


Solution

  • I do not understand what you really need. Do you need to change the separator sequence to something guaranteed not to exist in the dataset?

    If so, then that’s what Unicode’s 66 “non-character” code points are specifically designed for. You can use them as internal sentinels knowing that they cannot occur in valid data.

    If you’re just looking for a visual lookalike, that’s very different. I would not suggest that, because there are lots of confusables. Here are just a few of those:

    U+0007C ‭ |  GC=Sm SC=Common       VERTICAL LINE
    U+000A6 ‭ ¦  GC=So SC=Common       BROKEN BAR
    U+002C8 ‭ ˈ  GC=Lm SC=Common       MODIFIER LETTER VERTICAL LINE
    U+002CC ‭ ˌ  GC=Lm SC=Common       MODIFIER LETTER LOW VERTICAL LINE
    U+02016 ‭ ‖  GC=Po SC=Common       DOUBLE VERTICAL LINE
    U+023D0 ‭ ⏐  GC=So SC=Common       VERTICAL LINE EXTENSION
    U+02758 ‭ ❘  GC=So SC=Common       LIGHT VERTICAL BAR
    U+02759 ‭ ❙  GC=So SC=Common       MEDIUM VERTICAL BAR
    U+0275A ‭ ❚  GC=So SC=Common       HEAVY VERTICAL BAR
    U+02AF4 ‭ ⫴  GC=Sm SC=Common       TRIPLE VERTICAL BAR BINARY RELATION
    U+02AF5 ‭ ⫵  GC=Sm SC=Common       TRIPLE VERTICAL BAR WITH HORIZONTAL STROKE
    U+02AFC ‭ ⫼  GC=Sm SC=Common       LARGE TRIPLE VERTICAL BAR OPERATOR
    U+02AFE ‭ ⫾  GC=Sm SC=Common       WHITE VERTICAL BAR
    U+02AFF ‭ ⫿  GC=Sm SC=Common       N-ARY WHITE VERTICAL BAR
    U+0FF5C ‭ | GC=Sm SC=Common       FULLWIDTH VERTICAL LINE
    U+0FFE4 ‭ ¦ GC=So SC=Common       FULLWIDTH BROKEN BAR