I have a directory of files in the format of [alphanumeric]_[integer].wav
for example,
gdg36dhd3d_0.wav
gdg36dhd3d_1.wav
gdg36dhd3d_2.wav
344fikuo4q_0.wav
344fikuo4q_1.wav
The alphanumeric is the ID whereas the second number is the index. The following code should loop through a list of filenames and sum the index for each ID, and then plot a histplot
of the data.
# sum the number of isolated events for every original sample
number_isolated_events = list()
for k, g in itertools.groupby(isolated_filenames, key=lambda x:re.search('(\w+)_', x).group(1)):
number_events = len(list(g))
number_isolated_events.append(number_events)
sns.histplot(number_isolated_events, kde=True, color='b')
However, the histplot
looks like this when the above code is run and there are 20 items in isolated_filenames
:
The count is 1.0, which I assume means that only one filename is being matched to the RegEx expression '(\w+)_'
.
The x-axis is also showing what seems like the total number of indexes combined. I would assume a maximum of no more than 6 indexes per ID. My guess is that all the indexes have been summed and associated with the first filename rather than their respective filenames somehow. Is the issue my RegEx expression or is there a better way to achieve what I'm looking for?
You can use
import itertools
isolated_filenames = ['gdg36dhd3d_0.wav','gdg36dhd3d_1.wav','gdg36dhd3d_2.wav','344fikuo4q_0.wav','344fikuo4q_1.wav']
l = [(x.rsplit('_')[0], x.rsplit('_')[-1][:-4]) for x in isolated_filenames]
number_isolated_events = []
for k, g in itertools.groupby(l, key=lambda x: x[0]):
number_isolated_events.append(tuple([k, len(list(g))])) # if you need to count group items
#number_isolated_events.append(tuple([k, sum(int(z[1]) for z in g)])) # if you need to sum numeric suffixes
print(number_isolated_events)
See the Python demo. Output:
[('gdg36dhd3d', 3), ('344fikuo4q', 2)]
Notes:
[(x.rsplit('_')[0], x.rsplit('_')[-1][:-4]) for x in isolated_filenames]
creates a list of file parts before _
and the number after _
before the .wav
extension (I assume each file here has .wav
extension)itertools.groupby(l, key=lambda x: x[0])
groups by the file "root" name (without numeric suffix)len(list(g))
fetches the length of found items per keysum(int(z[1]) for z in g)
sums the numeric suffixes for each file.