Search code examples
pythondictionarytext

Skipping lines when turning a text file into a dictionary


I have a text file that looks like this:

word1   4
wöörd2   8
word3   12
word4   5
another word   1
many words one after another 1
word5   9

If it weren't for the lines with many words, the following code would work:

f = open("C:\\path\\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    k, v = line.strip().split()
    dict[k.strip()] = v.strip()

f.close()

But obviously in the above case I get ValueError: too many values to unpack (expected 2). I assume there are three options:

  1. Deleting it from the text file, which in a huge text file is difficult to do manually.
  2. Skipping the line if such a problem occurs.
  3. Modifying the code such that the value is always the last number.

I find 3. to be too daunting for a big, diverse (in terms of characters and words) file (especially since I don't care that much about the problematic lines). But for 2., how do I check if there are more than 2 elements when I perform the split of the line?


Solution

  • There is no need to check. Just catch the exception:

    with open("C:\\path\\words.txt") as f:
        result = {}
        for line in f:
            try:
                k, v = line.split()
            except ValueError:
                pass
            else:
                result[k] = v
    

    Now the code will work for empty lines too, or lines with no spaces in between words.

    Note that I made a few more changes:

    • Using with open(...) as f guarantees that f will be closed when the block is done (whatever happens)

    • Don't use the name dict; that's the built-in type you are now shadowing. I used result instead.

    • No need to use line.strip(), v.strip() or k.strip() when using str.split() with no arguments; the latter already removes leading and trailing whitespace from every split result:

      >>> "   str.strip() \t    strips   \f  all  whitespace  \n".split()
      ['str.strip()', 'strips', 'all', 'whitespace']
      

    You could make it a little more concise still by using the fact that dict.update() accepts a sequence of (key, value) tuples:

    with open("C:\\path\\words.txt") as f:
        result = {}
        for line in f:
            try:
                result.update([line.split()])
            except ValueError:
                pass