python pandas dataframe for-loop concatenation

Size of output dataframe pandas

I realized two input dataframes pandas:

    mu, sigma = 4., 1.3 
    s1 = pd.DataFrame(np.random.lognormal(mu, sigma, size=(1000, 1000)))
    min_value = s1.values.min() 
    max_value = s1.values.max() 
    s2 = pd.DataFrame(np.random.uniform(min_value, max_value, size=(1000, 1)))

Where s2 has only one column and s1 has 1000 columns and 1000 elements each. I created a loop in which each element from s2 is subtracted from each element of the s1 dataframe following the index order. The results are recorded in two output dataframes following some condiction. After each loop, the updated dataframe s2 should enter a new loop by trying subtraction with another column from the s1 dataframe. This operation should be run for the number of s1 columns.

r = pd.DataFrame(columns=['R'])
c = pd.DataFrame(columns=['C'])


for col in s1.columns:
    for idx, row in enumerate(s2.values):
        if (row >= 0.95*s1.iloc[idx, col]) and (row <= 1.05*s1.iloc[idx, col]):
            s2.iloc[idx,col] = 0
            r.iloc[idx,:] = 0
        elif row > s1.iloc[idx, col]:
            diff = row - s1.iloc[idx, col]
            s2.iloc[idx,col] = diff
            r.iloc[idx,0] = diff
            c.iloc[idx,0] = s2.iloc[idx, col]
        else:
            r.iloc[idx,0] = row

result = pd.concat([s2, r, c], axis=1)

when i run this script I get the error: IndexError: iloc cannot enlarge its target object I would like to update dataframes c and r on a single column avoiding overwriting. Can someone help me? Alternatively, it is acceptable to record results of each loop in a new column of c and r dataframe.

Solution

Your problem is here c.iloc[idx,0] = s2.iloc[idx, col] you are trying to assign a value to an index that doesn't exist in the DataFrame.

One solution would be to use the loc method to add a new row to the r and c DataFrames for each iteration of the loop.

Here is your updated code, it should work:

r = pd.DataFrame(columns=['R'])
c = pd.DataFrame(columns=['C'])

for col in s1.columns:
    for idx, row in enumerate(s2.values):
        if (row >= 0.95*s1.iloc[idx, col]) and (row <= 1.05*s1.iloc[idx, col]):
            s2.iloc[idx,col] = 0
            r.loc[idx,:] = 0
        elif row > s1.iloc[idx, col]:
            diff = row - s1.iloc[idx, col]
            s2.iloc[idx,col] = diff
            r.loc[idx, 'R'] = diff
            c.loc[idx, 'C' + str(col)] = s2.iloc[idx, col]
        else:
            r.loc[idx, 'R'] = row

result = pd.concat([s2, r, c], axis=1)