Search code examples
pythonpandasrequestpython-requestshug

How to send a pandas dataframe using POST method and receive it in Hug/other REST API framework? pickle.loads fails to unpickle after sending


How to send a pandas DataFrame using a POST method?

For example, the following hug server listens to a POST requests and responds with a pickled pandas DataFrame:

import hug
import pickle
import traceback
import pandas as pd

@hug.post()
def call(pickle_dump):
    print(type(pickle_dump))
    try:
        df = pickle.loads(pickle_dump)
        return pickle.dumps(df.iloc[0])
    except:
        print(traceback.format_exc())
        return pickle.dumps(pd.DataFrame())

When the following POST request is made:

import requests
import pandas as pd

df = pd.DataFrame(pd.np.random.randn(10,20))
r = requests.post('http://localhost:8000/call', data = {'pickle_dump':pickle.dumps(df)})
pickle.loads(r.text)

The server returns this traceback:

<class 'str'>
Traceback (most recent call last):
  File "post.py", line 9, in call
    df = pickle.loads(pickle_dump)
TypeError: a bytes-like object is required, not 'str'

127.0.0.1 - - [23/Jul/2018 17:12:12] "POST /call HTTP/1.1" 200 10

And likewise the client returns:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-292-956952cbfca9> in <module>()
      5 r = requests.post('http://localhost:8000/call', data = {'pickle_dump':pickle.dumps(df)})
      6 
----> 7 pickle.loads(r.text)

TypeError: a bytes-like object is required, not 'str'

This seems to be related to the fact that when a byte object is sent to the hug api, the bytes are converted to a str in the following way:

For example pickle.dumps(b'test') returns b'\x80\x03C\x04testq\x00.' on the client. When it is received in the hug server, this becomes str('\x80\x03C\x04testq\x00.') (missing b). The object can be decoded back to it's original form using pickle.loads('\x80\x03C\x04testq\x00.'.encode()[1:]).

Applying the above process on a DataFrame results in an UnpicklingError:

> pickle.dumps(pd.DataFrame())
b'\x80\x03cpandas.core.frame\nDataFrame\nq\x00)\x81q\x01}q\x02(X\t\x00\x00\x00_metadataq\x03]q\x04X\x04\x00\x00\x00_typq\x05X\t\x00\x00\x00dataframeq\x06X\x05\x00\x00\x00_dataq\x07cpandas.core.internals\nBlockManager\nq\x08)\x81q\t(]q\n(cpandas.core.indexes.base\n_new_Index\nq\x0bcpandas.core.indexes.base\nIndex\nq\x0c}q\r(X\x04\x00\x00\x00nameq\x0eNX\x04\x00\x00\x00dataq\x0fcnumpy.core.multiarray\n_reconstruct\nq\x10cnumpy\nndarray\nq\x11K\x00\x85q\x12C\x01bq\x13\x87q\x14Rq\x15(K\x01K\x00\x85q\x16cnumpy\ndtype\nq\x17X\x02\x00\x00\x00O8q\x18K\x00K\x01\x87q\x19Rq\x1a(K\x03X\x01\x00\x00\x00|q\x1bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?tq\x1cb\x89]q\x1dtq\x1ebu\x86q\x1fRq h\x0bh\x0c}q!(h\x0eNh\x0fh\x10h\x11K\x00\x85q"h\x13\x87q#Rq$(K\x01K\x00\x85q%h\x1a\x89]q&tq\'bu\x86q(Rq)e]q*]q+}q,X\x06\x00\x00\x000.14.1q-}q.(X\x06\x00\x00\x00blocksq/]q0X\x04\x00\x00\x00axesq1h\nustq2bub.'

Reversing the pickle

pickle.loads('\x80\x03cpandas.core.frame\nDataFrame\nq\x00)\x81q\x01}q\x02(X\t\x00\x00\x00_metadataq\x03]q\x04X\x04\x00\x00\x00_typq\x05X\t\x00\x00\x00dataframeq\x06X\x05\x00\x00\x00_dataq\x07cpandas.core.internals\nBlockManager\nq\x08)\x81q\t(]q\n(cpandas.core.indexes.base\n_new_Index\nq\x0bcpandas.core.indexes.base\nIndex\nq\x0c}q\r(X\x04\x00\x00\x00nameq\x0eNX\x04\x00\x00\x00dataq\x0fcnumpy.core.multiarray\n_reconstruct\nq\x10cnumpy\nndarray\nq\x11K\x00\x85q\x12C\x01bq\x13\x87q\x14Rq\x15(K\x01K\x00\x85q\x16cnumpy\ndtype\nq\x17X\x02\x00\x00\x00O8q\x18K\x00K\x01\x87q\x19Rq\x1a(K\x03X\x01\x00\x00\x00|q\x1bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?tq\x1cb\x89]q\x1dtq\x1ebu\x86q\x1fRq h\x0bh\x0c}q!(h\x0eNh\x0fh\x10h\x11K\x00\x85q"h\x13\x87q#Rq$(K\x01K\x00\x85q%h\x1a\x89]q&tq\'bu\x86q(Rq)e]q*]q+}q,X\x06\x00\x00\x000.14.1q-}q.(X\x06\x00\x00\x00blocksq/]q0X\x04\x00\x00\x00axesq1h\nustq2bub.'.encode()[1:])

Results in:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
<ipython-input-314-7082d60a5569> in <module>()
----> 1 pickle.loads('\x80\x03cpandas.core.frame\nDataFrame\nq\x00)\x81q\x01}q\x02(X\t\x00\x00\x00_metadataq\x03]q\x04X\x04\x00\x00\x00_typq\x05X\t\x00\x00\x00dataframeq\x06X\x05\x00\x00\x00_dataq\x07cpandas.core.internals\nBlockManager\nq\x08)\x81q\t(]q\n(cpandas.core.indexes.base\n_new_Index\nq\x0bcpandas.core.indexes.base\nIndex\nq\x0c}q\r(X\x04\x00\x00\x00nameq\x0eNX\x04\x00\x00\x00dataq\x0fcnumpy.core.multiarray\n_reconstruct\nq\x10cnumpy\nndarray\nq\x11K\x00\x85q\x12C\x01bq\x13\x87q\x14Rq\x15(K\x01K\x00\x85q\x16cnumpy\ndtype\nq\x17X\x02\x00\x00\x00O8q\x18K\x00K\x01\x87q\x19Rq\x1a(K\x03X\x01\x00\x00\x00|q\x1bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?tq\x1cb\x89]q\x1dtq\x1ebu\x86q\x1fRq h\x0bh\x0c}q!(h\x0eNh\x0fh\x10h\x11K\x00\x85q"h\x13\x87q#Rq$(K\x01K\x00\x85q%h\x1a\x89]q&tq\'bu\x86q(Rq)e]q*]q+}q,X\x06\x00\x00\x000.14.1q-}q.(X\x06\x00\x00\x00blocksq/]q0X\x04\x00\x00\x00axesq1h\nustq2bub.'.encode()[1:])

UnpicklingError: 

I am open to using any framework which will allow me to send and receive a pandas DataFrame using HTTP requests.

Both the server and the client are run in the same environment with identical package versions.

How to send and receive a pandas DataFrame using HTTP methods?


Solution

  • It seems like b64 encoding the pickled string seems to alleviate the issue. For brevity, I will use an example to demonstrate.

    Suppose I have the following dataframe:

    >>> import pandas as pd
    >>> df = pd.DataFrame({'a': [0, 1, 2, 3]})
    >>> df
       a
    0  0
    1  1
    2  2
    3  3
    

    Now, let's pickle the object to a bytes-like string, and then b64encode the pickled string:

    >>> import pickle
    >>> pickled = pickle.dumps(df)
    >>> import base64
    >>> pickled_b64 = base64.b64encode(pickled)
    >>> pickled_b64
    b'gANjcGFuZGFzLmNvcmUuZnJhbWUKRGF0YUZyYW1lCnEAKYFxAX1xAihYCQAAAF9tZXRhZGF0YXEDXXEEWAQAAABfdHlwcQVYCQAAAGRhdGFmcmFtZXEGWAUAAABfZGF0YXEHY3BhbmRhcy5jb3JlLmludGVybmFscwpCbG9ja01hbmFnZXIKcQgpgXEJKF1xCihjcGFuZGFzLmNvcmUuaW5kZXhlcy5iYXNlCl9uZXdfSW5kZXgKcQtjcGFuZGFzLmNvcmUuaW5kZXhlcy5iYXNlCkluZGV4CnEMfXENKFgEAAAAbmFtZXEOTlgEAAAAZGF0YXEPY251bXB5LmNvcmUubXVsdGlhcnJheQpfcmVjb25zdHJ1Y3QKcRBjbnVtcHkKbmRhcnJheQpxEUsAhXESQwFicROHcRRScRUoSwFLAYVxFmNudW1weQpkdHlwZQpxF1gCAAAATzhxGEsASwGHcRlScRooSwNYAQAAAHxxG05OTkr/////Sv////9LP3RxHGKJXXEdWAEAAABhcR5hdHEfYnWGcSBScSFoC2NwYW5kYXMuY29yZS5pbmRleGVzLnJhbmdlClJhbmdlSW5kZXgKcSJ9cSMoaA5OWAUAAABzdGFydHEkSwBYBAAAAHN0b3BxJUsEWAQAAABzdGVwcSZLAXWGcSdScShlXXEpaBBoEUsAhXEqaBOHcStScSwoSwFLAUsEhnEtaBdYAgAAAGk4cS5LAEsBh3EvUnEwKEsDWAEAAAA8cTFOTk5K/////0r/////SwB0cTJiiUMgAAAAAAAAAAABAAAAAAAAAAIAAAAAAAAAAwAAAAAAAABxM3RxNGJhXXE1aAtoDH1xNihoDk5oD2gQaBFLAIVxN2gTh3E4UnE5KEsBSwGFcTpoGoldcTtoHmF0cTxidYZxPVJxPmF9cT9YBgAAADAuMTQuMXFAfXFBKFgGAAAAYmxvY2tzcUJdcUN9cUQoWAgAAABtZ3JfbG9jc3FFY2J1aWx0aW5zCnNsaWNlCnFGSwBLAUsBh3FHUnFIWAYAAAB2YWx1ZXNxSWgsdWFYBAAAAGF4ZXNxSmgKdXN0cUtidWIu'
    

    So the 64encoded string is also a bytes-like string, but it doesn't contain the hex escape sequences so when it gets converted to a string, the string is still preserved when encoding it to bytes.

    Now, let's mimic what hug does to the string, as you have noted:

    >>> hug_pickled_str = pickled_b64.decode('utf-8')
    >>> hug_pickled_str
    'gANjcGFuZGFzLmNvcmUuZnJhbWUKRGF0YUZyYW1lCnEAKYFxAX1xAihYCQAAAF9tZXRhZGF0YXEDXXEEWAQAAABfdHlwcQVYCQAAAGRhdGFmcmFtZXEGWAUAAABfZGF0YXEHY3BhbmRhcy5jb3JlLmludGVybmFscwpCbG9ja01hbmFnZXIKcQgpgXEJKF1xCihjcGFuZGFzLmNvcmUuaW5kZXhlcy5iYXNlCl9uZXdfSW5kZXgKcQtjcGFuZGFzLmNvcmUuaW5kZXhlcy5iYXNlCkluZGV4CnEMfXENKFgEAAAAbmFtZXEOTlgEAAAAZGF0YXEPY251bXB5LmNvcmUubXVsdGlhcnJheQpfcmVjb25zdHJ1Y3QKcRBjbnVtcHkKbmRhcnJheQpxEUsAhXESQwFicROHcRRScRUoSwFLAYVxFmNudW1weQpkdHlwZQpxF1gCAAAATzhxGEsASwGHcRlScRooSwNYAQAAAHxxG05OTkr/////Sv////9LP3RxHGKJXXEdWAEAAABhcR5hdHEfYnWGcSBScSFoC2NwYW5kYXMuY29yZS5pbmRleGVzLnJhbmdlClJhbmdlSW5kZXgKcSJ9cSMoaA5OWAUAAABzdGFydHEkSwBYBAAAAHN0b3BxJUsEWAQAAABzdGVwcSZLAXWGcSdScShlXXEpaBBoEUsAhXEqaBOHcStScSwoSwFLAUsEhnEtaBdYAgAAAGk4cS5LAEsBh3EvUnEwKEsDWAEAAAA8cTFOTk5K/////0r/////SwB0cTJiiUMgAAAAAAAAAAABAAAAAAAAAAIAAAAAAAAAAwAAAAAAAABxM3RxNGJhXXE1aAtoDH1xNihoDk5oD2gQaBFLAIVxN2gTh3E4UnE5KEsBSwGFcTpoGoldcTtoHmF0cTxidYZxPVJxPmF9cT9YBgAAADAuMTQuMXFAfXFBKFgGAAAAYmxvY2tzcUJdcUN9cUQoWAgAAABtZ3JfbG9jc3FFY2J1aWx0aW5zCnNsaWNlCnFGSwBLAUsBh3FHUnFIWAYAAAB2YWx1ZXNxSWgsdWFYBAAAAGF4ZXNxSmgKdXN0cUtidWIu'
    

    Now to make the string consumable on the server-side:

    >>> ss_df = pickle.loads(base64.b64decode(hug_pickled_str.encode()))
    >>> ss_df
       a
    0  0
    1  1
    2  2
    3  3
    

    Therefore, you would need to base64 encode your pickled string and pass that string as the data to your API.