Search code examples
pandasindexingaggregateapply

Is it possible without using parallelization (Swifter, Parallel) to make an instant calculation immediately without passing through the index?


Is it possible without using parallelization (Swifter, Parallel) to make an instant calculation immediately without passing through the index, for example through the use of the "apply"-function for all dataset?

%%time
import random
df = pd.DataFrame({'A':random.sample(range(200), 200)})

for j in range(200):
    for i in df.index:
        df.loc[i,'A_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'A'].mean()

Solution

  • %%time
    import random
    df = pd.DataFrame({'A':random.sample(range(200), 200)})
    

    First calculate the sums.

    df[1] = df['A'].shift()
    for j in range(2, 200):
        df[j] = df[j-1].fillna(0) + df['A'].shift(j)
    

    Then do the division for means and take care of the formatting

    df = df.set_index('A')
    df.divide(df.columns, axis=1)\
        .fillna(method='ffill', axis=1)\
        .rename(lambda x: f'A_last_{x}', axis=1)\
        .reset_index()