Search code examples
c#hdf5hdf5dotnet

Load dataset from HDF5 file in C#


I'm trying to load a dataset from a HDF5 file in C# (.NET Framework) in such a way that I have the contents in an array, e.g. float[,]. I found the HDF.PInvoke library, but I find it very difficult to figure out how to use it.

Update

From Soonts answer, I managed to get it to work. Here's my working snippet:

using System;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/kernel:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);

            // read array (shape may be inferred w/ H5S.get_simple_extent_ndims)
            float[,] arr = new float[162, 128];
            GCHandle gch = GCHandle.Alloc(arr, GCHandleType.Pinned);
            try
            {
                H5D.read(dataSetId, typeId, H5S.ALL, H5S.ALL, H5P.DEFAULT,
                         gch.AddrOfPinnedObject());
            }
            finally
            {
                gch.Free();
            }

            // show one entry
            Console.WriteLine(arr[13, 87].ToString());

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

Original first attempt:

What I've managed so far:

using System;
using System.IO;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/bias:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);
            long spaceId = H5D.get_space(dataSetId);

            // not sure about this
            TextWriter tw = Console.Out;
            GCHandle gch = GCHandle.Alloc(tw);

            // I was hoping that  this would write to the Console, but the
            // program crashes outside the scope of the c# debugger.
            H5D.read(
                dataSetId,
                typeId,
                H5S.ALL,
                H5S.ALL,
                H5P.DEFAULT,
                GCHandle.ToIntPtr(gch)
            );

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

The signature for H5F.read() is:

Type    Name            Description
--------------------------------------------------------------
long    dset_id         Identifier of the dataset read from.
long    mem_type_id     Identifier of the memory datatype.
long    mem_space_id    Identifier of the memory dataspace.
long    file_space_id   Identifier of the dataset's dataspace in the file.
long    plist_id        Identifier of a transfer property list for this I/O operation.
IntPtr  buf             Buffer to receive data read from file.

Question

Could anyone help me fill in the blanks here?


Solution

  • You need to create an array (normal 1D one, not the 2D) of the correct size and type. Then write something like this:

    int width = 1920, height = 1080;
    float[] data = new float[ width * height ];
    var gch = GCHandle.Alloc( data, GCHandleType.Pinned );
    try
    {
        H5D.read( /* skipped */, gch.AddrOfPinnedObject() );
    }
    finally
    {
        gch.Free();
    }
    

    This will read the dataset into the data array, you can then copy individual lines into another, 2D array if you need that.

    Read API documentation how to get dimensions (HDF5 supports data set of arbitrary dimensions) and size of the dataset (for 2D dataset the size is 2 integers), i.e. how to find out the buffer size you need (for 2D dataset, it's width * height).

    As for the elements type, you better know that in advance, e.g. float is fine.