Search code examples
pythondataframetcplan

Share pandas.DataFrame between Windows and Linux?


Our company has an API for Windows only(server), I've wrote some python code to convert raw data to pd.DataFrame on server. I hope to send this dataframe to another python program running on CentOS 7(client). I would be grateful if anyone could give me a solution.

According to my research, I built a socket server on Windows. The dataframe looks like this:

Date | Ticker1 | Ticker 2|
--------------------------
May11| 100.01  | 143.12  |

Here is my code on server:

records_to_send=df.to_records().tostring()
conn.send(records_to_send) 

But when it comes to decoding, np.frombuffer() can't recognize the dtype I set. Even if I just run the following code on server:

np.frombuffer(df.to_records().to_string(),df.to_records().dtype)

would raise a ValueError:

ValueError: cannot create an OBJECT array from memory buffer

Solution

  • Pandas provides methods to serialize to json. Use a dataframe's to_json method to serialize to json and pandas' read_json method to read that back into a dataframe:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
    serialized=df.to_json()
    print(serialized)
    deserialized=pd.read_json(serialized)
    print(deserialized)