I have about a thousand files that are named in a semi-sensible way like the following:
aaa.ba.ca.01
aaa.ba.ca.02
aaa.ba.ca.03
aaa.ba.da.01
aaa.ba.da.02
aaa.ba.da.03
and so on. Let's say each file contains 2 columns of numbers which I need to read in to the dictionaries: wavelength, flux. The reading in part is easy for me, the hard part is that I need to load these dictionaries so that they store the information like:
wavelength['aaa.ba.ca.01'] (which is the wavelengths of that one file)
wavelength['aaa.ba.ca'] (which is the wavelengths of all subfiles ie ...ca.01, ...ca.02, and ...ca.03 -- in order)
wavelength['aaa.ba'] (which also includes all wavelengths of all "subfiles" as well -- again in order).
and so on. The filenames are well-behaved (the sections are separated by periods, the grouping hierarchy is always the same direction, etc.) but the files can be between 4 sections, or 8 sections long.
My question: is there some sensible way to have python glob the names of the files and by parsing strings or some other magic get the data into these dictionaries? I've hit a brick wall.
A simple, but not efficient, way to do so is to subclass Pythons dictionary, so that when given one non-complete key, it concatenates the contents of all matching keys, in alphabetical order.
There could be more efficient designs: this one major drawback being it will sort and verify all existing dictionary keys on a partial key request. Otherwise, it is so simple to implement that it is worth a try:
class MultiDict(dict):
def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
result = []
for complete_key in sorted(self.keys()):
if complete_key.startswith(key):
result.extend(self[complete_key])
return result
#example
a = MultiDict()
a["a0"] = [1]
a["a1"] = [2]
print a["a"]
[1, 2]
As for getting teh data in the dictionary, just iterate over all files, with glob or os.listdir, and read the desired contents, as a list, into a MultiDict item using the filename as a key.