I accessed a hyperspectral dataset from the following website http://microbia.org/index.php/resources. It is called "dataset.hdf5". I am trying to explore the data inside
import numpy as np
import h5py
hf=h5py.File("dataset.hdf5", 'r')
hf.keys()
Output:
<KeysViewHDF5 ['CSSs', 'IMGs', 'SEGMs', 'agarFootprint', 'circularity', 'convexity', 'hemolysis', 'inertia', 'labels', 'labelsPathogens', 'positions', 'sizes', 'waves']>
dataset_IMGs= hf['IMGs']
dataset_IMGs[:]
Output:
array([b'IMG_WLATRIO_51145900_T1080_TW0H1S1',
b'IMG_WLATRIO_51145900_T1080_TW0H1S1',
b'IMG_WLATRIO_51145900_T1080_TW0H1S1', ...,
b'IMG_WLATRIO_51144600_T1080_TW0H1S1',
b'IMG_WLATRIO_51144600_T1080_TW0H1S1',
b'IMG_WLATRIO_51144600_T1080_TW0H1S1'], dtype='|S35')
My target is to actually extract those images in their original format, but what I see above is some kind of binary encoding. I searched and tried scripts I found but none worked to help me extract those images.
Does anyone have an idea as to what and how to extract these images?
I agree with @jacub. This file doesn't appear to have any image data in it. I used a utility to get a summary of the datasets and their contents. IMGs is an array of file names This is what I found:
C:\Users\walker\Downloads>ptdump dataset.hdf5
/ (RootGroup) ''
/CSSs (Array(10398, 125)) ''
/IMGs (Array(10398,)) ''
/SEGMs (Array(10398,)) ''
/agarFootprint (Array(10398, 125)) ''
/circularity (Array(10398,)) ''
/convexity (Array(10398,)) ''
/hemolysis (Array(10398,)) ''
/inertia (Array(10398,)) ''
/labels (Array(10398,)) ''
/labelsPathogens (Array(10398,)) ''
/positions (Array(10398, 2)) ''
/sizes (Array(10398,)) ''
/waves (Array(125,)) ''
The link has this comment about the file: "The hyperspectral database contains a selected collection of spectral signatures from bacteria colonies on solid blood agar plates. ... The database has the aim to offer a first benchmark to assess image analysis algorithms performances for this application."
You can get raw image data using the links under this heading: MicrobIA Images Dataset (Beta ver. 0.1) MicrobIA_Dataset...sample.zip
has 20 images in 4 folders. I'd start there. The other datasets seem to require an account/ login that I don't have.