In a directory images, images are named like - 1_foo.png, 2_foo.png, 14_foo.png, etc.
The images are OCR'd and the text extract is stored in a dict
by the code below -
data_dict = {}
for i in os.listdir(images):
if str(i[1]) != '_':
k = str(i[:2]) # Get first two characters of image name and use as 'key'
else:
k = str(i[:1]) # Get first character of image name and use 'key'
# Intiates a list for each key and allows storing multiple entries
data_dict.setdefault(k, [])
data_dict[k].append(pytesseract.image_to_string(i))
The code performs as expected.
The images can have varying numbers in their name ranging from 1 to 99.
Can this be reduced to a dictionary comprehension
?
Yes. Here's one way, but I wouldn't recommend it:
{k: d.setdefault(k, []).append(pytesseract.image_to_string(i)) or d[k]
for d in [{}]
for k, i in ((i.split('_')[0], i) for i in names)}
That might be as clean as I can make it, and it's still bad. Better use a normal loop, especially a clean one like Dennis's.
Slight variation (if I do the abuse once, I might as well do it twice):
{k: d.setdefault(k, []).append(pytesseract_image_to_string(i)) or d[k]
for d in [{}]
for i in names
for k in i.split('_')[:1]}
Edit: kaya3 now posted a good one using a dict comprehension. I'd recommend that over mine as well. Mine are really just the dirty results of me being like "Someone said it can't be done? Challenge accepted!".