Search code examples
pythonpython-3.xdecodeencode

python encode() and decode() issues


Can anyone help me with this issue please encode method is not working and i cannot discover why

def encode_OctetString(A,flags,data):
    fs="!"+str(len(data))+"s"
    dbg="Encoding String format:",fs
    logging.debug(dbg)
    ret=struct.pack(fs,data).encode("hex")
    pktlen=8+len(ret)/2
    return encode_finish(A,flags,pktlen,ret

error code

 File "/home/ubuntu/diameter-test/libDiameter.py", line 434, in encode_OctetString
    ret=struct.pack(fs,data).encode("hex")
struct.error: argument for 's' must be a bytes object

Solution

  • Note that struct.pack() will normally return a bytes and bytes don't have an .encode() method, just a .decode() one - were you trying to first decode and then re-encode as "hex"?

    If so, you'll need to import codecs.encode() and apply that to the bytes object:

    import struct
    import logging
    import codecs
    
    
    def encode_OctetString(A, flags, data):
        fs = "!" + str(len(data)) + "s"
        dbg = "Encoding String format:", fs
        logging.debug(dbg)
        ret = codecs.encode(struct.pack(fs, data), "hex")
        pktlen = 8 + len(ret) / 2
        return encode_finish(A, flags, pktlen, ret)
    

    If you somehow intended encode_OctetString to work on strings instead of bytes objects, you'd use:

    def encode_OctetString(A, flags, data):
        fs = "!" + str(len(data)) + "s"
        dbg = "Encoding String format:", fs
        logging.debug(dbg)
        ret = codecs.encode(struct.pack(fs, data.encode()), "hex")
        pktlen = 8 + len(ret) / 2
        return encode_finish(A, flags, pktlen, ret)
    

    Given the link to the original code, that depends on the expected type for AVP_Value in encodeAVP - which is never called in the shared code, nor is its type documented and no type hint is provided, so it's impossible to say.

    Edit (after accepted answer): it's important to note the difference between a string and a bytes object. Python makes them look very similar, after all 'hello'.encode() just becomes b'hello', right?

    But they are very different. A string is a series of characters that have no specific encoding you have to worry about. So, 'hello' is just an 'h', followed by an 'e', etc. - without you having to worry about what encoding is used to represent those characters in storage. However, b'hello' is a series of bytes with a specific encoding, that could be interpreted to mean a series of characters 'hello' if they were decoded into a string using a known encoding (UTF-8 by default).

    That's why, in this answer, data is encoded first (using UTF-8, as no encoding is specified), so that struct.pack can deal with it (it requires bytes, not a string) and the result is then re-encoded into a hex string representation using the "hex" encoding. That's often somewhat confusing, because although it's an encoding, the result is a string.

    The takeaway should be that bytes are just bytes, that only have meaning if you decide what encoding they are in. Strings str are just abstract series of characters, the encoding is immaterial. Encoding means taking something with meaning and encoding it into binary that represents that content. Decoding means taking a bunch of binary code (bytes) and interpreting it back into something meaningful, like a string.