I have an DataFrame which is including an "ID" column and "Vector"(which includes (1,500) sized arrays) column. I have to save the DF as csv. When I convert the saved csv to DF again; the array becomes string and I could not use it with the functions anymore.
For example before saving the DF vector column is like:
>>DataFrame_example["Vector"][0]
Out:
array([[-4.51561287e-02, -5.02060959e-03, 1.01038935e-02,
-3.24810972e-03, 8.50208327e-02, -3.12430300e-02,
-3.06447037e-02, -6.82420060e-02, 4.08798642e-02
...........................................
-6.08731210e-02, 4.24617827e-02, 2.90670991e-02,
1.87119041e-02, 5.67540973e-02, 4.65381369e-02,
3.42479758e-02, 9.88676678e-03, -1.62497200e-02,
1.46159781e-02, -6.39008060e-02]], dtype=float32)
>>type(DataFrame_example["Vector"][0])
Out: numpy.ndarray
But after saving as csv and read it again same block output becomes;
>>DataFrame_example["Vector"][0]
'[[-4.51561287e-02 -5.02060959e-03 1.01038935e-02 -3.24810972e-03\n 8.50208327e-02 -3.12430300e-02 -3.06447037e-02 -6.82420060e-02\n 4.08798642e-02 2.49120360e-03 -6.40684515e-02
............................................................................................
-5.22072986e-02\n 6.16791770e-02 -8.88353493e-03 1.65628344e-02 -5.95084354e-02\n -8.45786110e-02 -8.65871832e-03 3.98499370e-02 -3.41838486e-02\n -2.02250257e-02 5.18149361e-02 -5.80132604e-02 7.66506651e-03\n -5.49656115e-02 -6.08731210e-02 4.24617827e-02 2.90670991e-02\n 1.87119041e-02 5.67540973e-02 4.65381369e-02 3.42479758e-02\n 9.88676678e-03 -1.62497200e-02 1.46159781e-02 -6.39008060e-02]]'
How can I keep the format, any help would appreciated.
I am saving the DF in csv format;
compression_opts = dict(method='zip',
archive_name=save_name+'.csv')
DataFrame_example.to_csv(save_name+'.zip', index=False,
compression=compression_opts)
I am reading it with;
DataFrame_example=read_csv("example.csv")
I have triedreading it with deliiter="," or sep="," also.
You can use pandas.DataFrame.to_pickle
instead:
df = pd.DataFrame({'a': [1,2,3,4], 'b' : [np.array([5,6,7,8]), np.array([5,6,7,8]),np.array([5,6,7,8]),np.array([5,6,7,8])]})
# a b
#0 1 [5, 6, 7, 8]
#1 2 [5, 6, 7, 8]
#2 3 [5, 6, 7, 8]
#3 4 [5, 6, 7, 8]
type(df.b[0])
#<class 'numpy.ndarray'>
df.to_pickle("out.txt")
new = pd.read_pickle("out.txt")
type(new.b[0])
#<class 'numpy.ndarray'>