Search code examples
pythonloopscsvcall

Python - loop through file to call most recent row that contained a variable in current row


I'm trying to improve my Python skills and general basic coding. I have a csv file, the first 7 rows (including the header) of which are shown below:

HomeTeam     AwayTeam      HomeTeamWin     AwayTeamWin
AV           MU            1               0
BR           QPR           1               0
C            E             0               1
MU           BR            1               0
QPR          C             0               1
E            AV            0               1

I am trying to implement the following code such that an output file will be generated that shows, based on the result from their most recent game, if the home team was / was not coming off a win. I am stuck at the section marked with ******

#start loop
for row in file:
    #create empty list to put value we will find into
    observation_list=[]
    #define variable a as being row[0], i.e. the cell 
    #in the current row that contains the 'hometeam'
    a=row[0]
    #*****stuck here*******#
    #call the last row to contain variable a i.e. where toprow = the most recent row
    #above the current row to have contained varaible a i.e. the value from row[0]
    for toprow in file:
    #*****stuck here*******#
        if (toprow[0] or toprow[1])==a: 
            #implement the following if statement
            #where toprow[0] is the 1st column containing the value
            #of the hometeam from the toprow
            if (toprow[0]==a):      
            #implement the following to generate an output file showing
            #1 or 0 for home team coming off a win
                b=toprow[2]
                observation_list.append(b)
                with open(Output_file, "ab") as resultFile:
                     writer = csv.writer(resultFile, lineterminator='\n')
                     writer.writerow(observation_list)  
            else (toprow[1]==a):
            #implement the following if statement
            #where toprow[1] is the 1st column containing the value
            #of the hometeam from the toprow
                b==toprow[3]
                observation_list.append(b])
            #implement the following to generate an output file showing
            #1 or 0 for home team coming off a win
                with open(Output_file, "ab") as resultFile:
                     writer = csv.writer(resultFile, lineterminator='\n')
                     writer.writerow(observation_list)

From what I have done and read thus far I can see there being two problems:

Problem 1: how can I get the second for loop, marked with ****, to iterate over the previously read rows until it reaches the most recent row to contain the variable define by 'a' ?

Problem 2: How do I start the code block from the 3rd row? The reason this needs to be done is to prevent A. reading the header and, more importantly, B. trying to read a non existent / negative row i.e. row1 - 1 = row0, row0 doesn't exist!?

NB the desired output file would be as follows:

-blank-      #first cell would be empty as there is no data to fill it
-blank-      #second cell would be empty as there is no data to fill it
-blank-      #third cell would be empty as there is no data to fill it
0            #fourth cell contains 0 as MU lost their most recent game
0            #fifth cell contains 0 as QPR lost their most recent game
1            #sixth cell contains 1 as E won their most recent game

Solution

  • A good thing to do is to write down, in words, the steps you think you need to take to solve the problem. For this problem I want to:

    1. skip the first line of the file
    2. read a line, and split it into its parts
    3. If this is the home team's first game print a blank, if not print the result of the last game it played.
    4. repeat till the file is exhausted.

    While the file is being read, store the result of the most recently played game so it can be looked up later. dictionaries are made for this - {team1 : result_of_last_game, team2 : result_of_last_game, ...}. When looking up each team's first game, there wont be a previous game - the dictionary will throw a KeyError. the KeyError can be handled with a try/except block or collections.defaultdictionary could be used to account for this.

    I like to use operator.itemgetter when extracting items from a sequence - it makes the code a bit more readable for when I look at it later.

    import operator, collections
    
    home = operator.itemgetter(0,2)    #first and third item
    away = operator.itemgetter(1,3)    #second and fourth item
    team = operator.itemgetter(0)      #first item
    
    #dictionary to hold the previous game's result
    #default will be a blank string
    last_game = collections.defaultdict(str)
    
    #string to format the output
    out = '{}\t{}'
    with open('data.txt') as f:
        #skip the header
        f.next()
        #data = map(parse, f)
        for line in f:
            #split the line into its relavent parts
            line = line.strip()
            line = line.split()
            #extract the team and game result
            #--> (team1, result), (team2, result)
            h, a = home(line), away(line)
            home_team = team(h)
            #print the result of the last game
            print(out.format(home_team, last_game[home_team]))
            #update the dictionary with the results of this game
            last_game.update([h,a])
    

    Instead of printing the results, you could easily write them to a file or collect them in a container and write them to a file later.


    If you want something other than an empty string for your defaultdict, you could do something like this

    class Foo(object):
        def __init__(self, foo):
            self.__foo = foo
        def __call__(self):
            return self.__foo
    blank = Foo('-blank-')
    last_game = collections.defaultdict(blank)