Search code examples
pythonstructbyteencodebinary-data

Does struct.pack and encode Produce The Same Byte-String? Python


I got quite confused about the difference between struct.pack() and encode. To my understanding, the output of both struct.pack() and encode is byte-string. And that struct.pack() is used to convert numbers into bytes, while encode is used to convert string into bytes. Is my understanding right?

Then what if when you struct.pack('format',something) and something.encode('utf-8') which return the same byte-string. Then how do you tell if the byte-string represents a number or a string? eg:

bstring = b'\xc3\xa9'
a = bstring.decode('utf-8')
b = struct.unpack('>H',bstring)
print(a,b)

>>>é (50089,)   #see, using different converting methods return different results

Solution

  • If you were given an object of class 'bytes', such as b'\xc3\xa9', you wouldn't know what it represented, whether that be a number or a string or something else, without any additional information.

    You could decode it using utf-8

    >>> b'\xc3\xa9'.decode('utf-8')
    'é'
    

    You could also decode it using utf-16 and get a different result

    >>> b'\xc3\xa9'.decode('utf-16')
    '꧃'
    

    encode is a string method. So, yes, it operates on a string object only and its purpose is to translate a string, using the prescribed encoding scheme (e.g. utf-8), into a bytes object and return it.

    struct.pack() is NOT just meant to convert numbers into a bytes object. It converts an arbitrary number of python values (integer, bool, float, and/or even bytes) into a bytes object that corresponds exactly to the layout in memory of the corresponding C struct. You can think of a C struct as analogous to a Python Tuple or NamedTuple. Indeed, struct.unpack() returns a Python Tuple.

    struct could be used to load binary data, stored in files or from network connections, written using the C language -- or written using Python struct.pack() or some other language's equivalent version of struct.pack() -- into analogous Python values.