I am trying to perform a nested loop thorugh a dataframe and I am really really new in using python. Somehow looking through the google I found many examples but the final one which I need. I used iterrows to loop over the dataframe and index on the date using only data which has the same date. That works. Now I want the nested loop but don't know how that works with iterrows? The code looks like the folloiwng:
import pandas as pd
df = pd.read_csv('C:/Files_Employees.csv', encoding='cp1252', sep=';', index_col=0).dropna()
for current_date in df.index.unique():
print('calculating date: ' +str(current_date))
for index, row in df.iterrows():
if index == current_date:
print(row['Person'])
I did it via a nested loop but here I am not sure how i could do the indexing as showed above and somehow the expected results are wrong. The code looks like the following:
import pandas as pd
df = pd.read_csv('C:/Files_Employees.csv', encoding='cp1252', sep=';', index_col=0).dropna()
df2 = pd.DataFrame([])
for i in range(0, len(df)):
for j in range(i+1, len(df)):
if df.iloc[i]['Working Group'] == df.iloc[j]['Working Group']:
working_hours = df.iloc[i]['Working Hours'] + df.iloc[j]['Working Hours']
print(df.iloc[i]['Working Group'], working_hours)
If an example is needed I can include one.
The example file looks like the following:
working_date Working Group Person Working Hours Country
2017-07-14 1 Mike 59 USA
2017-07-14 2 Molly 60 USA
2017-07-14 3 Dennis 45 USA
2017-07-14 4 Pablo 45 USA
2017-07-14 1 Jeff 42 USA
2017-07-14 2 Emily 55 USA
2017-07-14 3 Sophia 46 USA
2017-07-14 4 Alice 41 USA
2017-07-14 1 Ethan 57 USA
2017-07-14 2 Alexander 59 USA
2017-07-14 3 Edward 41 USA
2017-07-14 4 Daniel 46 USA
2017-07-15 1 Mike 59 USA
2017-07-15 2 Molly 59 USA
2017-07-15 3 Dennis 61 USA
2017-07-15 4 Pablo 58 USA
2017-07-15 1 Jeff 58 USA
2017-07-15 2 Emily 51 USA
2017-07-15 3 Sophia 65 USA
2017-07-15 4 Alice 53 USA
2017-07-15 1 Ethan 49 USA
2017-07-15 2 Alexander 61 USA
2017-07-15 3 Edward 56 USA
2017-07-15 4 Daniel 65 USA
The final outpout should be the like the following, which summs in the nested loop every working group together, e.g. Working_Group one for working_date 2017-07-14 is 59+42+57 = 158:
working_date Working Group Working Hours Country
2017-07-14 1 158 USA
2017-07-14 2 174 USA
2017-07-14 3 132 USA
2017-07-14 4 132 USA
2017-07-15 1 166 USA
2017-07-15 2 171 USA
2017-07-15 3 182 USA
2017-07-15 4 176 USA
With Pandas, you should use vectorised operations. Here you can simply use GroupBy
+ sum
:
res = df.groupby(['working_date', 'WorkingGroup', 'Country']).sum().reset_index()
#alternative
res = (df.groupby(['working_date','Working Group', 'Country'], as_index=False)
['Working Hours'].sum())
print(res)
working_date WorkingGroup Country WorkingHours
0 2017-07-14 1 USA 158
1 2017-07-14 2 USA 174
2 2017-07-14 3 USA 132
3 2017-07-14 4 USA 132
4 2017-07-15 1 USA 166
5 2017-07-15 2 USA 171
6 2017-07-15 3 USA 182
7 2017-07-15 4 USA 176