python parsing loops percentage readlines

python parsing input, making a sum of the input

I am attempting to parse a file with the following format

1999
I
Willem Jan van Steen         9859  77
Guillaume Kielmann           5264  77
Guillaume Bos                8200   6

(the file is much longer, and is seperated by academic year (as 1999) and different studies(as 'I'). The only thing i have to work with is the last number (like 77, 77, 6) This number is a percentage. In final goal is to make a BarChart consisting of 10 bars, the bar charts consist of the amound(sum) of times a percentage from the file falls into the range of the Bar Chart (say a bar chart from 70 to 80 % --> then if the above input is the whole file the sum would be 2, and the barchart will be of height 2. But my first problem is that i dont know how to parse the input.. I was thinking that python should read the lines and then from the index (so making a range) on which the percentage number starts to 'do somethinh' with the numbers (--> look in which range of bar chart they fall and then make a loop for the sum of how many times a percentage falls in that Bar Chart..)

Hope someone can help me!

Solution

Use str.rsplit() to split a string into words, counting from the right. If you pass in None it'll split on arbitrary-width whitespace, giving you neat stripped strings, and a count, letting you keep whitespace in the first column.

Short demo of what that means:

>>> 'Willem Jan van Steen         9859  77\n'.rsplit(None, 2)
['Willem Jan van Steen', '9859', '77']

Here the spaces in the name are preserved, but the two numbers at the end are now separate elements in a list. The newline at the end is gone.

If you loop over an open file object, you get separate lines, giving you a method to parse a file line by line:

with open(inputfilename) as inputfh:
    for line in inputfh:
        columns = line.rsplit(None, 2)
        if len(columns) < 3:
            continue  # not a line with name and numbers
        percentage = int(columns[2])
        if 70 <= percentage <= 80:
            # we have a line that falls within your criteria