Search code examples
stringalgorithmcompression

String to string compression algorithm?


I'm looking for an algorithm that would compress some string to another string (i.e. without "\0" or special control characters), but I can't find anything on the internet. Is there such an algorithm? It doesn't have to be particularly efficient, just something basic.


Solution

  • Apparently you have some specific character set in mind and you want to use it for both the original string and the compressed string.

    Standard compression routines (e.g. gzip) work on byte strings.

    One idea is to take existing code (e.g. gzip's) and rewrite it to use your character set instead of bytes.

    Another is to construct a 1-to-1 mapping between strings in your character set and arbitrary byte strings, map the original string to a byte string, compress the byte string using a standard compression utility or function, and map the result back to a string using your character set. (Strictly speaking you can use two different mappings.)

    One way to construct the mapping is to pad your character set with dummies and a special pad character until you have 2^k different characters (for some k); then each 8 of your characters correspond to k bytes (and shorter strings can be padded with the pad character).