Search code examples
character-encodingdata-sciencecharacterascii

Is it possible to create your own character symbols?


So I have tried using private character editor for creating character symbols, but it is impossible to export them to be shared with other users. However I was wondering is there a way to create your own character set? Is there a programming language for it or different OS? I know characters have 64x64 grid, which is available in private character editor.

You may ask why I'm doing this. Well I'm learning how character symbols are created, so I want to create and export. I know that unicode or ascii dominates, however if anyone knows anything, please do let me know :)


Solution

  • Where do I start? You're mixing together several different layers/systems, so it's hard to know what exactly you need/want, but I'll try to unwind some of them to at least give you directions on where to continue looking.

    A "character" is a very vague term that can be used to refer to and/or be composed of any or all of the following:

    1. an abstract description of the character, like "the latin captial letter A" to describe A.
    2. some ID of some kind to identify that character. In Unicode the ID of the latin capital letter A is U+0041 (that's hex, the equivalent decimal value is 65). In Unicode that's called the "unicode codepoint".
    3. a description of how to encode these #1 and/or #2 into bytes to store in files/transfer/... ASCII is one such encoding (a very limited, old one that can only use a very small set of characters). ISO-8859-1 or Latin1 are slightly newer ones but still very limited. UTF-8 and UTF-16 are modern ones that can encode every character that Unicode describes.
    4. One (or more) glyph (i.e. physical shape description) that tells the computer how that character is actually drawn to the screen (or printed).

    Now if you want to invent your own new character, you need to do something for all 4 of those.

    #1 is fairly straightforward: think of it and write it down.

    #2 is already harder: pretty much the standard to give away those IDs is the Unicode specification these days. While that universality is great and it solved a ton of problems, you can't really just "add something" to Unicode on your own (that's also one of its majors strengths, by the way).

    There is an escape hatch, however: Unicode provides so-called Private Use Areas which are ranges of codepoints that Unicode explicitly promises to never officially assign a meaning so that they can be used by some software internally.

    You could simply pick one of those IDs and say that this is now your new character. Of course no one else will agree with you (that's the point of a Private Use Area), but we'll care about that later.

    We can solve #3 by simply using one of the universal encodings (preferably UTF-8), if we decide to use the PUA. If we don't use the PUA, then ... well, you're basically out of luck, because you'll have to define an entirely new encoding that you still have to tell everyone else about (and convince them to support/use it).

    And #4 is where we actually start defining what the character actually looks like. You say that "characters have 64x64 grid" which is really just one possible way. Most charaters on modern computers are drawn with vector fonts, mostly TrueType and OpenType. What you describe is a so called Bitmap Font.

    Fonts basically define the shape of characters (usually identified by the Unicode codepoint) by providing an image (either bitmap or vector) that the computer ought to use to draw that character (in reality it's a bit more complex than that, because some glyphs might actually represent multiple unicode codepoints or a single codepoints might be represented by multiple different glyphs, depending on what's around it).

    Now if you want to define your own font that has a glyph for your character, you simply have to assign the shape you want to the PUA codepoint you picked earlier.

    Now: no one else will know what you mean when you use that PUA codepoint, but if you tell them "oh, and make sure to render that with the font I provided", then it will at least look like you want it to.

    So at a high level, what you need to do to define your own character that you can share with your friends:

    1. pick a codepoint from one of the Private Use Areas. Conflicts with what others picked are unavoidable.
    2. create a font that has a glyph for that codepoint (using a tool like FontForge, for example)
    3. send some text with that codepoint to your friends and tell them to use the font you also sent them to display it.

    I glanced over some fairly involved details, but articles like this one should help fill in some of the gaps.