I need some help.
I have an HDF5 file containing spectrum data (time, frequencies, and the power level of a given frequency at a given time). Here is how the file structured (using HDFView):
The main groups (keys) are the HOURS, and then inside are the minutes, with each minute being its own group (key). Data was collected at 0.02 seconds for 60 seconds... so there are 3000 rows...and the frequency bins are 256 (i.e. start at 1MHz and end at 26MHz, spaced into 256 spacing apart. For example.e. 23 --> 23:10 --> 2D array of the power
0 0 1 2 ..... 255
1 -53.672386 -53.82235 -53.773468 ..... -50.566887
2 -53.85694 -53.945183 -53.63385 ..... -51.306465
3 -53.709038 -53.55101 -53.55305 ..... -52.7324906
.
.
.
2999 -53.23989 -51.501495 -50.681602 -52.227474
I am able to access the individual minutes data and pull them into arrays and then plot the data. Like this:
import h5py
import numpy as np
import matplotlib.pyplot as plt
# Read in the HDF5 file
file = h5py.File("/home/tom/Desktop/2021-10-28_ch0.hdf5", 'r')
# Search for the main groups in the file. The main groups are hours: 20, 22, etc...
# Select one of the hours (i.e. 23)
hour = file['23']
# Search for the subgroups (keys) within the chosen hour. There are "hour:minutes" i.e. 23:10
#for key in hour.keys():
#print( key )
# Select key with data for minutes 10, 11, 12, 13 and save into individual arrays:
minute_data_10=hour['23:10'][()]
minute_data_11=hour['23:11'][()]
minute_data_12=hour['23:12'][()]
minute_data_13=hour['23:13'][()]
# Generate a 1D array of TIME spanning 4 minutes (because we ingested
# 4x 1 minute slices of data:
time = np.linspace(0, 60*4, 3000*4)
# Generate a 1D array of FREQUENCY
frequency = np.linspace(1.575E0, 26.82402336E0, 256)
# Combine minute_data_10 minute_data_11 minute_data_12 and minute_data_13 along the time axis (axis=0)
comb_min = np.concatenate( (minute_data_10, minute_data_11, minute_data_12, minute_data_13), axis=0 )
print( comb_min.shape )
# Plot the data
im = plt.pcolormesh(frequency, time, comb_min, cmap='jet')
plt.colorbar(im).ax.tick_params(labelsize=10)
plt.title('Spectrum')
plt.ylabel('Seconds ago...')
#plt.xlabel('frequency in Hz')
im.axes.xaxis.set_ticklabels([])
plt.show()
I am manually defining each minute (min 10, 11, 12, 13) combining them and then plotting them.
BUT...what I would like to do is to automatically ingest ALL minutes for ALL hours of my choosing and then plot it into one plot. For example, how can I ingest ALL minutes in hour 15 and then plot the spectrum ? OR, how could I plot the first 5 hours of the data ?
HDF5 files are self-describing. (In other words, you can get the group or dataset names from the file -- you don't have to know them in advance.) As noted above, you do this with the .keys()
method. (Note: h5py objects are NOT dictionaries; h5py simply uses Python's dictionary syntax to access the names.)
Using the keys/names has the additional benefit of only reading existing datasets. Looking at your image, there are datasets for time 15:00
and 15:02
but not for 15:01
. (This gap has additional implications when creating your plot --- but that's a different problem.)
Code below shows how to do this. It uses the same approach: create a list of h5py objects, then combine into a single array with np.concatenate
. It also collects the hh:mm
times (from the dataset names) in a list you can use to create the time
array.
I used Python's file context manager. This is preferred over open/close methods (avoids leaving files open, and improves readability).
Simple example (hard coded for ['15'] hour group):
with h5py.File('/home/tom/Desktop/2021-10-28_ch0.hdf5.h5','r') as h5f:
times = []
collect = []
hh = '15'
for hhmm in h5f[hh].keys():
times.append(hhmm)
collect.append(h5f[hh][hhmm])
comb_min = np.concatenate( collect, axis=0 )
print(times)
print(len(collect), comb_min.shape)
More general example (reads all groups[hours] and datasets['hh:mm']):
with h5py.File('/home/tom/Desktop/2021-10-28_ch0.hdf5.h5','r') as h5f:
times = []
collect = []
for hh in h5f.keys():
for hhmm in h5f[hh].keys():
times.append(hhmm)
collect.append(h5f[hh][hhmm])
comb_min = np.concatenate( collect, axis=0 )
print(times)
print( len(collect), comb_min.shape )