Search code examples
pythonmatlabhdf5nastran

PyTables vs Matlab HDF5 read times


I have an HDF5 output file from NASTRAN that contains mode shape data. I am trying to read them into Matlab and Python to check various post-processing techniques. The file in question is in the local directory for both of these tests. The file is semi-large at 1.2 GB but certainly not that large in terms of HDF5 files I have read previously. There are 17567342 rows and 8 columns in the table I want to access. The first and last columns are integers the middle 6 are floating point numbers.

Matlab:

file = 'HDF5.h5';
hinfo = hdf5info(file);
% ... Find the dataset I want to extract
t = hdf5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');

This last operation is extremely slow (can be measured in hours).

Python:

import tables
hfile = tables.open_file("HDF5.h5")
modetable = hfile.root.NASTRAN.RESULT.NODAL.EIGENVECTOR
data = modetable.read()

This last operation is basically instant. I can then access data as if it were a numpy array. I am clearly missing something very basic about what these commands are doing. I'm thinking it might have something to do with data conversion but I'm not sure. If I do type(data) I get back numpy.ndarray and type(data[0]) returns numpy.void.

What is the correct (i.e. speedy) way to read the dataset I want into Matlab?


Solution

  • Matlab provided another HDF5 reader called h5read. Using the same basic approach the amount of time taken to read the data was drastically reduced. In fact hdf5read is listed for removal in a future version. Here is same basic code with the perfered functions.

    file = 'HDF5.h5';
    hinfo = h5info(file);
    % ... Find the dataset I want to extract
    t = h5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');