I have an original dataframe in which I want to overwrite a few columns within a loop. In the end, I want to stack all dataframes (generated in the loop) into one big dataframe (which contains then x times the original dataframe with changes here and there. The stacking of dfs doesn't really seem to work for me (and the multiplication of column A with i either (adding with i did work))
With the 'ID'-column I keep track of all different dataframes in the big one since they are kind of visibly together when looking at the ID's.
Tried code:
import pandas as pd
df_or = pd.DataFrame({"Case": [1,2,3,4,5,6],
"A": [3,5,2,8,4,1],
"B": [10,12,24,8,57,84]})
print(df_or)
total = []
for i in range(0,2):
df = df_or
df.loc[:, 'A'] = df_or.loc[:, 'A'].mul(i) #.add(i) works fine
df.loc[:, 'ID'] = df.loc[:,'Case'] + i*100000
print(df)
total.append(df)
total = pd.concat(total)
total = total.sort_values('ID')
total.reset_index(inplace=True, drop=True)
print(total)
Desired result:
Case A B ID
0 1 0 10 1
1 2 0 12 2
2 3 0 24 3
3 4 0 8 4
4 5 0 57 5
5 6 0 84 6
6 1 3 10 100001
7 2 5 12 100002
8 3 2 24 100003
9 4 8 8 100004
10 5 4 57 100005
11 6 1 84 100006
Instead I get:
Case A B ID
0 1 0 10 100001
1 1 0 10 100001
2 2 0 12 100002
3 2 0 12 100002
4 3 0 24 100003
5 3 0 24 100003
6 4 0 8 100004
7 4 0 8 100004
8 5 0 57 100005
9 5 0 57 100005
10 6 0 84 100006
11 6 0 84 100006
You can try this:
df_or['id'] = [i for i in range(1, len(df_or['Case'])+1)]
df1 = df_or.copy()
df_or['id'] = ['10000'+str(i) for i in range(1, len(df_or['Case'])+1)]
df = pd.concat([df1, df_or])
print(df)
Case A B id
0 1 3 10 1
1 2 5 12 2
2 3 2 24 3
3 4 8 8 4
4 5 4 57 5
5 6 1 84 6
0 1 3 10 100001
1 2 5 12 100002
2 3 2 24 100003
3 4 8 8 100004
4 5 4 57 100005
5 6 1 84 100006
Or to fix your code, you need to use df = df_or.copy()
:
total = []
for i in range(0,2):
df = df_or.copy()
df.loc[:, 'A'] = df_or.loc[:, 'A'].mul(i) #.add(i) works fine
df.loc[:, 'ID'] = df.loc[:,'Case'] + i*100000
print(df)
total.append(df)
total = pd.concat(total)
total = total.sort_values('ID')
total.reset_index(inplace=True, drop=True)
print(total)
Case A B ID
0 1 0 10 1
1 2 0 12 2
2 3 0 24 3
3 4 0 8 4
4 5 0 57 5
5 6 0 84 6
6 1 3 10 100001
7 2 5 12 100002
8 3 2 24 100003
9 4 8 8 100004
10 5 4 57 100005
11 6 1 84 100006