I am attempting to parse a file with the following format
1999
I
Willem Jan van Steen 9859 77
Guillaume Kielmann 5264 77
Guillaume Bos 8200 6
(the file is much longer, and is seperated by academic year (as 1999) and different studies(as 'I'). The only thing i have to work with is the last number (like 77, 77, 6) This number is a percentage. In final goal is to make a BarChart consisting of 10 bars, the bar charts consist of the amound(sum) of times a percentage from the file falls into the range of the Bar Chart (say a bar chart from 70 to 80 % --> then if the above input is the whole file the sum would be 2, and the barchart will be of height 2. But my first problem is that i dont know how to parse the input.. I was thinking that python should read the lines and then from the index (so making a range) on which the percentage number starts to 'do somethinh' with the numbers (--> look in which range of bar chart they fall and then make a loop for the sum of how many times a percentage falls in that Bar Chart..)
Hope someone can help me!
Use str.rsplit()
to split a string into words, counting from the right. If you pass in None
it'll split on arbitrary-width whitespace, giving you neat stripped strings, and a count, letting you keep whitespace in the first column.
Short demo of what that means:
>>> 'Willem Jan van Steen 9859 77\n'.rsplit(None, 2)
['Willem Jan van Steen', '9859', '77']
Here the spaces in the name are preserved, but the two numbers at the end are now separate elements in a list. The newline at the end is gone.
If you loop over an open file object, you get separate lines, giving you a method to parse a file line by line:
with open(inputfilename) as inputfh:
for line in inputfh:
columns = line.rsplit(None, 2)
if len(columns) < 3:
continue # not a line with name and numbers
percentage = int(columns[2])
if 70 <= percentage <= 80:
# we have a line that falls within your criteria