Search code examples
pythonsplitreadfile

Reading specific column from file when last few rows are not equivalent in python


I have a problem during the reading of a text file in python. Basically what I need is to get the 4th column in a list.

With this small function I achieve it without any great issues:

def load_file(filename):


    f = open(filename, 'r')

   # skip the first useless row
   line = list(f.readlines()[1:])

   total_sp = []


    for i in line:
        t = i.strip().split()
        total_sp.append(int(t[4]))

    return total_sp

but now I have to manage files, that in the last row(s) have any random number that don't respect the text format. An example of the not working text file is:

#generated file
well10_1         3        18         6         1         2  -0.01158   0.01842       142
well5_1         1        14         6         1         2  0.009474   0.01842       141
well4_1         1        13         4         1         2  -0.01842  -0.03737       125
well7_1         3        10         1         1         2 -0.002632  0.009005       101
well3_1         1        10         9         1         2  -0.03579  -0.06368       157
well8_1         3        10        10         1         2  -0.06895   -0.1021       158
well9_1         3        10        18         1         2   0.03053   0.02158       176
well2_1         1         4         4         1         2  -0.03737  -0.03737       128
well6_1         3         4         5         1         2  -0.07053   -0.1421       127
well1_1        -2         3         1         1         2  0.006663  -0.02415       128
         1    0.9259
         2   0.07407

where 1 0.9259 and 2 0.07407 have to be dumped.

In fact, using the function of above with this text file, I get the following error because of the 2 additional last rows:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/tmp/tmpqi8Ktw.py", line 21, in load_obh
    total_sp.append(int(t[4]))
IndexError: list index out of range

How can I get rid of the last lines in the line variable?

Thanks to all


Solution

  • There are many ways to handle this, one such way can be to handle the indexError by surrounding the erroneous code by try and except, something like this :

    try :
        total_sp.append(int(t[4]))
    except IndexError : 
        pass
    

    This will only append to the total_sp when index exits otherwise not. Also, this will handle whenever you do not have the data present corresponding to that particular index.

    Alternatively, if you are interested in removing just the last two rows (elements), you can use the slice operator such as by replacing line = list(f.readlines()[1:]) with line = f.readlines()[1:-2].