Search code examples
pythontextcompressionbyteencode

How can I compress text in Python to only use certain characters?


How can I compress text in Python but the output string only contains upper and lowercase letters, numbers, "." (dot), and "_" (underscore).

Here is an example of something that would work but the output does not contain any lowercase letters or dots so the compression would be less efficient.

import base64
import zlib

def compress(s):
  return base64.b32encode(zlib.compress(s,level=9)).decode("ascii").replace("=","-")
print(compress(b"abc"))

Solution

  • You don't need the trailing equal signs. Remove them, and then add them back on the decoding end based on the length of the encoded data. Also the base64 methods have an option to specify the last two characters to use instead of + and /.

    To encode:

    enc = base64.b64encode(dat, b'._').rstrip(b'=')
    

    To decode:

    dat = base64.b64decode(enc + b'=' * (-len(enc) & 3), b'._')