Search code examples
pythonredisredis-clustersnappyredis-py

Compress in Java, decompress in Python - snappy/redis-py-cluster


I am writing cron script in python for a redis cluster and using redis-py-cluster for only reading data from a prod server. A separate Java application is writing to redis cluster with snappy compression and java string codec utf-8.

I am able to read data but not able to decode it.

from rediscluster import RedisCluster
import snappy


host, port ="127.0.0.1", "30001"
startup_nodes = [{"host": host, "port": port}]
print("Trying connecting to redis cluster host=" + host +  ", port=" + str(port))

rc = RedisCluster(startup_nodes=startup_nodes, max_connections=32, decode_responses=True)
print("Connected",  rc)

print("Reading all keys, value ...\n\n")
for key in rc.scan_iter("uidx:*"):
   value = rc.get(key)
   #uncompress = snappy.uncompress(value, decoding="utf-8")
   print(key, value)
   print('\n')

print("Done. exit()")
exit()

decode_responses=False is working fine with the comment. however changing decode_responses=True is throwing error. My guess is its not able to get the correct decoder.

Traceback (most recent call last):
File "splooks_cron.py", line 22, in <module>
print(key, rc.get(key))
File "/Library/Python/2.7/site-packages/redis/client.py", line 1207, in get
return self.execute_command('GET', name)
File "/Library/Python/2.7/site-packages/rediscluster/utils.py", line 101, in inner
return func(*args, **kwargs)
File "/Library/Python/2.7/site-packages/rediscluster/client.py", line 410, in execute_command
return self.parse_response(r, command, **kwargs)
File "/Library/Python/2.7/site-packages/redis/client.py", line 768, in parse_response
response = connection.read_response()
File "/Library/Python/2.7/site-packages/redis/connection.py", line 636, in read_response
raise e
: 'utf8' codec can't decode byte 0x82 in position 0: invalid start byte

PS: Uncommenting this line uncompress = snappy.uncompress(value, decoding="utf-8") is breaking with error

Traceback (most recent call last):
File "splooks_cron.py", line 27, in <module>
uncompress = snappy.uncompress(value, decoding="utf-8")
File "/Library/Python/2.7/site-packages/snappy/snappy.py", line 91, in uncompress
return _uncompress(data).decode(decoding)
snappy.UncompressError: Error while decompressing: invalid input 

Solution

  • After hours of debugging, I was finally able to solve this.

    I am using xerial/snappy-java compressor in my Java code which is writing to redis cluster. Interesting thing is that during compression xerial SnappyOutputStream adds some offset at the beginning of the compress data. In my case this looks something like this

    "\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\xb6\x8b\x06\\******actual data here*****
    

    Due to this, the decompressor was not able to figure out. I modified code as below and remove offset form the value. it's working fine now.

    for key in rc.scan_iter("uidx:*"):
       value = rc.get(key) 
       #in my case offset was 20 and utf-8 is default ecoder/decoder for snappy 
       # https://github.com/andrix/python-snappy/blob/master/snappy/snappy.py
       uncompress_value = snappy.decompress(value[20:])
       print(key, uncompress_value)
       print('\n')