Search code examples
pythonexcelpandasdataframeexcel-tables

How can I get pair of negative + positive value from a .xlsx table using python


I have an excel files that looks like this :

N_order Amount
1137 258
1137 -258
1137 258
1138 802
1139 983
1139 837
1139 -837
1139 -248

I want to do pair with negative and positive value, but only pair, as an example I want them to be linked like that :

N_order Amount Link
1137 258 linked
1137 -258 linked
1137 258
1138 802
1139 983
1139 837 linked
1139 -837 linked
1139 -248

how can I do it ? I tried this code :

df['link'] = ""
for i in range(len(df)):
    commande = df.at[i, 'N_order']
    montant = df.at[i, 'Amount']
    if montant > 0:
        match = df.loc[(df['N_order'] == commande) & (df['Amount'] == -montant)]
        if not match.empty:
            df.at[i, 'link'] = commande
            df.at[match.index[0], 'link'] = commande
df = df[(df['N_order'] != '') & (df['link'] != '') & (df['Amount'] != '')][['N_order', 'Amount', 'link']]
df.to_excel("fileoutput.xlsx", index=False)
df

but it makes my table looks like that :

N_order Amount Link
1137 258 linked
1137 -258 linked
1137 258 linked
1138 802
1139 983
1139 837 linked
1139 -837 linked
1139 -248

(there is 1750 rows, so I can't just hand correct it.)

How can I do it ? Thanks you !


Solution

  • De-duplicate (with groupby.cumcount) and pivot the absolute values and sign to identify the matching indices, then assign the comment:

    import numpy as np
    
    idx = (df
       .assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
               sign=np.sign(df['Amount']),
               abs=df['Amount'].abs()
              )
       .reset_index()
       .pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
       .reindex([-1, 1], axis=1)
       .dropna().to_numpy().ravel()
    )
    
    df.loc[idx, 'Link'] = 'linked'
    

    Output:

       N_order  Amount    Link
    0     1137     258  linked
    1     1137    -258  linked
    2     1137     258     NaN
    3     1138     802     NaN
    4     1139     983     NaN
    5     1139     837  linked
    6     1139    -837  linked
    7     1139    -248     NaN
    

    Intermediate pivot of the indices:

    sign            -1    1
    N_order n abs          
    1137    0 258  1.0  0.0
            1 258  NaN  2.0
    1138    0 802  NaN  3.0
    1139    0 248  7.0  NaN
              837  6.0  5.0
              983  NaN  4.0
    
    bonus: matching the paired value

    With a minor modification you can also match the index of the paired values:

    df2 = (df.assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
               sign=np.sign(df['Amount']),
               abs=df['Amount'].abs()
              )
       .reset_index()
       .pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
       .reindex([-1, 1], axis=1)
       .dropna().reset_index(drop=True)
    )
    
    df['matching_index'] = pd.concat([df2.set_index(-1)[1], df2.set_index(1)[-1]])
    

    Output:

       N_order  Amount  matching_index
    0     1137     258             1.0
    1     1137    -258             0.0
    2     1137     258             NaN
    3     1138     802             NaN
    4     1139     983             NaN
    5     1139     837             6.0
    6     1139    -837             5.0
    7     1139    -248             NaN