I want to create X number of new columns in a pandas dataframe based on an existing column of the dataframe. I would like to create new columns that shift the values in the original column by 1 at a time.
I wrote the following code for this purpose:
import pandas as pd
x = range(1,10000)
df = pd.DataFrame({'QObs':x})
for i in range(1,120):
nameQ = 'QObs' + str(i)
df[nameQ] = df['QObs'].shift(i)
However, I obtained the following message:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df[nameQ] = df['QObs'].shift(i)
I tried using pd.concat and pd.join but I have similar problems:
df_new = pd.DataFrame()
for i in range(1,120):
nameQ = 'QObs' + str(i)
df_new[nameQ] = df['QObs'].shift(i)
df = pd.concat([df,df_new], axis=1)
This version takes much longer to run.
Thank you very much for your help!
Build your list first with a comprehension then concat once at the end:
qobs = [df['QObs'].shift(i).rename(f'QObs{i}') for i in range(1, 120)]
out = pd.concat([df['QObs'], *qobs], axis=1)
Output:
>>> out
QObs QObs1 QObs2 QObs3 QObs4 QObs5 ... QObs114 QObs115 QObs116 QObs117 QObs118 QObs119
0 23 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
1 89 23.0 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2 40 89.0 23.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
3 60 40.0 89.0 23.0 NaN NaN ... NaN NaN NaN NaN NaN NaN
4 30 60.0 40.0 89.0 23.0 NaN ... NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
195 74 94.0 77.0 1.0 68.0 6.0 ... 28.0 7.0 19.0 74.0 46.0 46.0
196 2 74.0 94.0 77.0 1.0 68.0 ... 50.0 28.0 7.0 19.0 74.0 46.0
197 71 2.0 74.0 94.0 77.0 1.0 ... 77.0 50.0 28.0 7.0 19.0 74.0
198 52 71.0 2.0 74.0 94.0 77.0 ... 94.0 77.0 50.0 28.0 7.0 19.0
199 48 52.0 71.0 2.0 74.0 94.0 ... 69.0 94.0 77.0 50.0 28.0 7.0
[200 rows x 120 columns]