Below is the code and im just showing for 2015 to 2016 as an example. Im actually writing the same code for 2015-2019 dataset. This works for me but as you can see, its long winded as i have to keep repeating the codes(till 2019 dataset).
Question: 1. What will be the better and efficient way of writing this?
df2015 = pd.read_csv('EPL_20152016.csv', parse_dates=['Date'], dayfirst=True,
usecols=['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'BbAv>2.5','BbAv<2.5'])
df2015.rename(columns={'BbAv>2.5': 'Avg>2.5', 'BbAv<2.5': 'Avg<2.5'},inplace=True)
df2015['FTTG'] = df2015['FTHG'] + df2015['FTAG']
df2015['%Avg>2.5'] = 100* (1 / df2015['Avg>2.5'])
df2015['%Avg<2.5'] = 100* (1 / df2015['Avg<2.5'])
df2015['%TotalAvg><2.5'] = df2015['%Avg>2.5'] + df2015['%Avg<2.5']
df2015['%Vig><2.5'] = df2015['%TotalAvg><2.5'] - 100
#df2015 = df2015[['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTTG', 'FTR','Avg>2.5','Avg<2.5',
df2015 = df2015.reindex(columns=['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTTG', 'FTR','Avg>2.5','Avg<2.5','%Avg>2.5', '%Avg<2.5', '%TotalAvg><2.5', '%Vig><2.5'])
df2016 = pd.read_csv('EPL_20162017.csv', parse_dates=['Date'], dayfirst=True,
usecols=['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'BbAv>2.5','BbAv<2.5'])
df2016.rename(columns={'BbAv>2.5': 'Avg>2.5', 'BbAv<2.5': 'Avg<2.5'},inplace=True)
df2016['FTTG'] = df2016['FTHG'] + df2016['FTAG']
df2016['%Avg>2.5'] = 100* (1 / df2016['Avg>2.5'])
df2016['%Avg<2.5'] = 100* (1 / df2016['Avg<2.5'])
df2016['%TotalAvg><2.5'] = df2016['%Avg>2.5'] + df2016['%Avg<2.5']
df2016['%Vig><2.5'] = df2016['%TotalAvg><2.5'] - 100
df2016 = df2016.reindex(columns=['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTTG', 'FTR','Avg>2.5','Avg<2.5','%Avg>2.5', '%Avg<2.5', '%TotalAvg><2.5', '%Vig><2.5'])
Same code done for 2017,2018 and 2019 datasets. And then i use pd.concat and .set_index
df20152019 = pd.concat([df2015, df2016, df2017, df2018, df2019]).set_index('Date')
After reading your files with pd.read_csv() into different dataframes, you can try adding them in a list. Then have a for loop that loops over each dataframe in that list to perform the desired operations. For example, you could try something like this :
df_list = [df2015, df2016, df2017, df2018, df2019]
for df in df_list:
df.rename(columns={'BbAv>2.5': 'Avg>2.5', 'BbAv<2.5': 'Avg<2.5'},inplace=True)
df['FTTG'] = df['FTHG'] + df['FTAG']
df['%Avg>2.5'] = 100* (1 / df['Avg>2.5'])
df['%Avg<2.5'] = 100* (1 / df['Avg<2.5'])
df['%TotalAvg><2.5'] = df['%Avg>2.5'] + df['%Avg<2.5']
df['%Vig><2.5'] = df['%TotalAvg><2.5'] - 100
df = df.reindex(columns=['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG','FTTG', 'FTR','Avg>2.5','Avg<2.5','%Avg>2.5', '%Avg<2.5', '%TotalAvg><2.5', '%Vig><2.5'])