Search code examples
pythonpandasdataframearraylistnumpy-ndarray

How do I prevent a numpy ndarray column from being converted to string when saving a Pandas DataFrame to csv?


I have an DataFrame which is including an "ID" column and "Vector"(which includes (1,500) sized arrays) column. I have to save the DF as csv. When I convert the saved csv to DF again; the array becomes string and I could not use it with the functions anymore.

For example before saving the DF vector column is like:

>>DataFrame_example["Vector"][0]
Out:

array([[-4.51561287e-02, -5.02060959e-03,  1.01038935e-02,
        -3.24810972e-03,  8.50208327e-02, -3.12430300e-02,
        -3.06447037e-02, -6.82420060e-02,  4.08798642e-02
             ...........................................
        -6.08731210e-02,  4.24617827e-02,  2.90670991e-02,
         1.87119041e-02,  5.67540973e-02,  4.65381369e-02,
         3.42479758e-02,  9.88676678e-03, -1.62497200e-02,
         1.46159781e-02, -6.39008060e-02]], dtype=float32)


>>type(DataFrame_example["Vector"][0])
Out: numpy.ndarray


But after saving as csv and read it again same block output becomes;

>>DataFrame_example["Vector"][0]

'[[-4.51561287e-02 -5.02060959e-03  1.01038935e-02 -3.24810972e-03\n   8.50208327e-02 -3.12430300e-02 -3.06447037e-02 -6.82420060e-02\n   4.08798642e-02  2.49120360e-03 -6.40684515e-02  
 ............................................................................................
-5.22072986e-02\n   6.16791770e-02 -8.88353493e-03  1.65628344e-02 -5.95084354e-02\n  -8.45786110e-02 -8.65871832e-03  3.98499370e-02 -3.41838486e-02\n  -2.02250257e-02  5.18149361e-02 -5.80132604e-02  7.66506651e-03\n  -5.49656115e-02 -6.08731210e-02  4.24617827e-02  2.90670991e-02\n   1.87119041e-02  5.67540973e-02  4.65381369e-02  3.42479758e-02\n   9.88676678e-03 -1.62497200e-02  1.46159781e-02 -6.39008060e-02]]'

How can I keep the format, any help would appreciated.

I am saving the DF in csv format;

compression_opts = dict(method='zip',
                        archive_name=save_name+'.csv')
DataFrame_example.to_csv(save_name+'.zip', index=False,
          compression=compression_opts)  
 

I am reading it with;

DataFrame_example=read_csv("example.csv")

I have triedreading it with deliiter="," or sep="," also.


Solution

  • You can use pandas.DataFrame.to_pickle instead:

    df = pd.DataFrame({'a': [1,2,3,4], 'b' : [np.array([5,6,7,8]), np.array([5,6,7,8]),np.array([5,6,7,8]),np.array([5,6,7,8])]})
    
    #   a             b
    #0  1  [5, 6, 7, 8]
    #1  2  [5, 6, 7, 8]
    #2  3  [5, 6, 7, 8]
    #3  4  [5, 6, 7, 8]
    
    type(df.b[0])
    #<class 'numpy.ndarray'>
    
    df.to_pickle("out.txt")
    
    new = pd.read_pickle("out.txt")
    type(new.b[0])
    #<class 'numpy.ndarray'>