HASHBYTES, sha2_256 in sql introduce bad characters when called from python

One of our old sql legacy code, converts a numerical column in sql using the HASHBYTES function and sha2_256.

The entire process is moving to python as we are putting in some advanced usage on top of the legacy work. However, when using connector, we are calling the same sql code, the HASHBYTES('sha2_256',column_name) id returning values with lot of garbage.

Running the code in sql result in this

Column      Encoded_Column
101286297   0x7AC82B2779116F40A8CEA0D85BE4AA02AF7F813B5383BAC60D5E71B7BDB9F705

Running same sql query from python result in

Column      Encoded_Column
101286297

b"z\xc8+'y\x11o@\xa8\xce\xa0\xd8[\xe4\xaa\x02\xaf\x7f\x81;S\x83\xba\xc6\r^q\xb7\xbd\xb9\xf7\x05"

Code is

Select Column,HASHBYTES('SHA2_256', CONVERT(VARBINARY(8),Column)) as Encoded_Column from table

I have tried usual garbage removal but not helping

Solution

You are getting the right result but is displayed as raw bytes (This is why you have the b in b"...").

Looking at the result from SQL you have the data encoded with hexadecimal.

So to transform the python result you can do:

x = b"z\xc8+'y\x11o@\xa8\xce\xa0\xd8[\xe4\xaa\x02\xaf\x7f\x81;S\x83\xba\xc6\r^q\xb7\xbd\xb9\xf7\x05"

x.hex().upper()

And the result will be:

'7AC82B2779116F40A8CEA0D85BE4AA02AF7F813B5383BAC60D5E71B7BDB9F705'

Which is what you had in SQL.

You can read more here about the 0x at the start of the SQL result that is not present in the python code.

And finally, if you are working with pandas you can convert the whole column with:

df["Encoded_Column"] = df["Encoded_Column"].apply(lambda x: x.hex().upper())

# And if you want the '0x' at the start do:
df["Encoded_Column"] = "0x" + df["Encoded_Column"]