Search code examples
pythonhdf5h5py

Storing a list of strings to a HDF5 Dataset from Python


I am trying to store a variable length list of string to a HDF5 Dataset. The code for this is

import h5py
h5File=h5py.File('xxx.h5','w')
strList=['asas','asas','asas']  
h5File.create_dataset('xxx',(len(strList),1),'S10',strList)
h5File.flush() 
h5File.Close()  

I am getting an error stating that "TypeError: No conversion path for dtype: dtype('&lt U3')" where the &lt means actual less than symbol
How can I solve this problem.


Solution

  • You're reading in Unicode strings, but specifying your datatype as ASCII. According to the h5py wiki, h5py does not currently support this conversion.

    You'll need to encode the strings in a format h5py handles:

    asciiList = [n.encode("ascii", "ignore") for n in strList]
    h5File.create_dataset('xxx', (len(asciiList),1),'S10', asciiList)
    

    Note: not everything encoded in UTF-8 can be encoded in ASCII!