I have a text file that looks like this:
word1 4
wöörd2 8
word3 12
word4 5
another word 1
many words one after another 1
word5 9
If it weren't for the lines with many words, the following code would work:
f = open("C:\\path\\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
k, v = line.strip().split()
dict[k.strip()] = v.strip()
f.close()
But obviously in the above case I get ValueError: too many values to unpack (expected 2)
. I assume there are three options:
I find 3. to be too daunting for a big, diverse (in terms of characters and words) file (especially since I don't care that much about the problematic lines). But for 2., how do I check if there are more than 2 elements when I perform the split of the line?
There is no need to check. Just catch the exception:
with open("C:\\path\\words.txt") as f:
result = {}
for line in f:
try:
k, v = line.split()
except ValueError:
pass
else:
result[k] = v
Now the code will work for empty lines too, or lines with no spaces in between words.
Note that I made a few more changes:
Using with open(...) as f
guarantees that f
will be closed when the block is done (whatever happens)
Don't use the name dict
; that's the built-in type you are now shadowing. I used result
instead.
No need to use line.strip()
, v.strip()
or k.strip()
when using str.split()
with no arguments; the latter already removes leading and trailing whitespace from every split result:
>>> " str.strip() \t strips \f all whitespace \n".split()
['str.strip()', 'strips', 'all', 'whitespace']
You could make it a little more concise still by using the fact that dict.update()
accepts a sequence of (key, value)
tuples:
with open("C:\\path\\words.txt") as f:
result = {}
for line in f:
try:
result.update([line.split()])
except ValueError:
pass