Search code examples
pythonpython-3.xpandasdataframecell

insert a data into a specific locations in dataframe with dataframe.loc in pandas python


I tried insert a data into a specific locations in dataframe with two options. Option 1 uses fixed colum label and variable index label and Option 2 uses fixed index label and variable colum label and then Option 1 has no error but Option 2 has warning PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`

Option 1 : no warning

import pandas as pd

df=pd.DataFrame()

for col in range(200):
    df.loc[str(col),'A']=str(col)

Option 2 : warning

df1=pd.DataFrame()
for col in range(200):
    df1.loc['A',str(col)]=str(col)

Solution

  • Pandas is really not designed to add/insert data repeatedly in a loop. This creates many expensive intermediates.

    Rather loop with a low level object (list/dictionary) and construct the DataFrame once in the end.

    First code:

    df = pd.DataFrame({'A': {str(i):i for i in range(200)}})
    

    Second code:

    df = pd.DataFrame.from_dict({'A': {str(i):i for i in range(200)}}, orient='index')