I am having a hard time reading in an excel file with pandas DataFrame
and converting a stored matrix to a numpy array
. I think part of the issue is that the matrix is improperly stored. I have no control over the spreadsheet however, this is how it was sent to me.
For instance this is the string stored in a cell
[[[ 0. 0. 0.107851]
[ 0. 0. -0.862809]]]
I read in the row with DataFrame
, and save each cell to a variable. I then try to convert this particular variable to a np.array
since those number represent two sets of x, y, z coordinates.
I have tried np.fromstring
and np.asarray
to no avail. It will convert the string to a numpy array but it will be a terrible mess with the brackets still inside as characters. I have tried using np.squeeze to get rid of the brackets but it says dimension is not 1.
if I use np.asarray(item._coord, dtype=float)
then it fails saying it cannot convert the string to float.
ValueError: could not convert string to float: '[[[ 0. 0. 0.107851] [ 0. 0. -0.862809]]]'
There is a '\n' that shows up in the middle of it, between the two lists. I use df = df.replace(r'\n', ' ',regex=True)' to clean out the
\n`'s prior to data conversion attempts.
I am stuck
Use custom function for convert to numpy array
after read_excel
:
a= np.array([[[ 0., 0., 0.107851],
[ 0., 0., -0.862809]]])
print (a)
[[[ 0. 0. 0.107851]
[ 0. 0. -0.862809]]]
df = pd.DataFrame({'col':[a,a,a]})
print (df)
col
0 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
df.to_excel('test.xlsx', index=False)
import re
import ast
import numpy as np
#https://stackoverflow.com/a/44323021
def str2array(s):
# Remove space after [
s=re.sub('\[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,\s]+', ', ', s)
return np.array(ast.literal_eval(s))
df = pd.read_excel('test.xlsx')
df['col'] = df['col'].apply(str2array)
print (df)
col
0 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2 [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]