I have checked different solutions, but could not understand how to apply them to multidimensional arrays. To be precise, my code results in a larger array than it should be, as shown in the picture below:
import h5py
import pandas as pd
import numpy as np
data = [[1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861], [1583663558450195, -7.063664436340332, -6.2776079177856445, -4.206898212432861, -4.206898212432861]]
df = pd.DataFrame(data)
hf = h5py.File('dtype.h5', 'w')
dataTypes = np.dtype([('ts', 'u8'), ('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('temp', 'f4')])
ds = hf.create_dataset('Acceleration', data=df.astype(dataTypes))
I would like to make it like this, where the columns are uint64, 4x float32 respectively:
ts x y z temp
0 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
1 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
2 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
3 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
4 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
5 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
6 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
7 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
8 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
9 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
Your df
:
In [370]: df
Out[370]:
0 1 2 3 4
0 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
1 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
2 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
3 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
...
df.astype(dataTypes)
gives me a TypeError
(my pd
isn't the latest).
In [373]: df.to_records()
Out[373]:
rec.array([(0, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(1, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(2, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(3, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(4, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(5, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(6, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(7, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(8, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821),
(9, 1583663558450195, -7.06366444, -6.27760792, -4.20689821, -4.20689821)],
dtype=[('index', '<i8'), ('0', '<i8'), ('1', '<f8'), ('2', '<f8'), ('3', '<f8'), ('4', '<f8')])
This array should save with h5py
.
to_records
has parameters that may create something closer to your dataTypes
. I'll let you explore those.
But with the latest restructuring a recfunctions
, we can make a structured array with:
In [385]: import numpy.lib.recfunctions as rf
In [386]: rf.unstructured_to_structured(np.array(data), dataTypes)
Out[386]:
array([(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898),
(1583663558450195, -7.0636644, -6.277608, -4.206898, -4.206898)],
dtype=[('ts', '<u8'), ('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('temp', '<f4')])
np.array(data)
is (10,5) float array.
In [388]: pd.DataFrame(_386)
Out[388]:
ts x y z temp
0 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
1 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
2 1583663558450195 -7.063664 -6.277608 -4.206898 -4.206898
...