python pandas dataframe performance concatenation

Issue creating new columns based on existing columns in DataFrame - PerformanceWarning: DataFrame is highly fragmented

I want to create X number of new columns in a pandas dataframe based on an existing column of the dataframe. I would like to create new columns that shift the values in the original column by 1 at a time.

I wrote the following code for this purpose:

import pandas as pd

x = range(1,10000)
df = pd.DataFrame({'QObs':x})

for i in range(1,120):
    nameQ = 'QObs' + str(i)
    df[nameQ] = df['QObs'].shift(i)

However, I obtained the following message:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead.  To get a de-fragmented frame, use `newframe = frame.copy()`
  df[nameQ] = df['QObs'].shift(i)

I tried using pd.concat and pd.join but I have similar problems:

df_new = pd.DataFrame()
for i in range(1,120):
    nameQ = 'QObs' + str(i)
    df_new[nameQ] = df['QObs'].shift(i)
    df = pd.concat([df,df_new], axis=1)

This version takes much longer to run.

Thank you very much for your help!

Solution

Build your list first with a comprehension then concat once at the end:

qobs = [df['QObs'].shift(i).rename(f'QObs{i}') for i in range(1, 120)]
out = pd.concat([df['QObs'], *qobs], axis=1)

Output:

>>> out
     QObs  QObs1  QObs2  QObs3  QObs4  QObs5  ...  QObs114  QObs115  QObs116  QObs117  QObs118  QObs119
0      23    NaN    NaN    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
1      89   23.0    NaN    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
2      40   89.0   23.0    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
3      60   40.0   89.0   23.0    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
4      30   60.0   40.0   89.0   23.0    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
..    ...    ...    ...    ...    ...    ...  ...      ...      ...      ...      ...      ...      ...
195    74   94.0   77.0    1.0   68.0    6.0  ...     28.0      7.0     19.0     74.0     46.0     46.0
196     2   74.0   94.0   77.0    1.0   68.0  ...     50.0     28.0      7.0     19.0     74.0     46.0
197    71    2.0   74.0   94.0   77.0    1.0  ...     77.0     50.0     28.0      7.0     19.0     74.0
198    52   71.0    2.0   74.0   94.0   77.0  ...     94.0     77.0     50.0     28.0      7.0     19.0
199    48   52.0   71.0    2.0   74.0   94.0  ...     69.0     94.0     77.0     50.0     28.0      7.0

[200 rows x 120 columns]