Search code examples
pandasdictionarydataframelogical-operatorscode-duplication

How to avoid coding if statement twice


I program mostly by myself so nobody checks my code. I feel like I've developed a bunch of bad habits.

The code I am pasting here works, but I would like to hear some other solutions.

I create a dictionary called teams_shots. I iterate through a pandas dataframe, which has the name of the away team and the home team in one row. I would like to keep track of shots made by each team that appears in the data frame. That is why I check if home_team_name or away_team_name do not have an entry in the dictionary, if so I create one.

for index,match in df.iterrows():
    if match['home_team_name'] not in teams_shots:
        #we have to setup an entry in the dictionary
        teams_shots[match['home_team_name']]=[]
        teams_shots[match['home_team_name']].append(match['home_team_shots'])
        home_shots_avg.append(None)
    else:
        home_shots_avg.append(np.mean(teams_shots[match['home_team_name']]))
        teams_shots[match['home_team_name']].append(match['home_team_shots'])

    if match['away_team_name'] not in teams_shots:
        teams_shots[match['away_team_name']]=[]
        teams_shots[match['away_team_name']].append(match['away_team_shots'])
        away_shots_avg.append(None)
    else:
        away_shots_avg.append(np.mean(teams_shots[match['away_team_name']])) 
        teams_shots[match['away_team_name']].append(match['away_team_shots'])

As you can see almost the same code is written twice, which is not a sign of good programming. I thought about using an or operator in the if statement, but then one entry might already be made and I would truncate it. Any ideas how to write this code better.


Solution

  • In this case I think an additional for loop should do the trick:

    for index,match in df.iterrows():
            for name, shots in {'home_team_name':'home_team_shots',
                                'away_team_name':'away_team_shots'}:
    
                if match[name] not in teams_shots:
                    #we have to setup an entry in the dictionary
                    teams_shots[name]=[]
                    teams_shots[name].append(match[shots])
                    home_shots_avg.append(None)
                 else:
                    home_shots_avg.append(np.mean(teams_shots[name]))
    

    But there might be a way to handle this in a vectorized fashion.