Search code examples
pythonpandasnatsort

Create a new pandas column of indexes from natsort


Given this input

df = pd.DataFrame(
                [["1 (2)", "4"], ["5 (3)", "2"], ["4 (2)", "4"], ["1 (1)", "4"], ["1 (2)", "3"]],
                columns=["a", "b"],
            )
    a       b
0   1 (2)   4
1   5 (3)   2
2   4 (2)   4
3   1 (1)   4
4   1 (2)   3

to arrive at this output

expected =  pd.DataFrame(
                [["1 (2)", "4", 2], ["5 (3)", "2", 0], ["4 (2)", "4", 1], ["1 (1)", "4", 4], ["1 (2)", "3", 3]],
                columns=["a", "b", "c"],
            )
  a        b   c
0   1 (2)   4   2
1   5 (3)   2   0
2   4 (2)   4   1
3   1 (1)   4   4
4   1 (2)   3   3

Where the c column contains the order of the items in column a which is natsorted. I need to do this:

indexes_human_sorted = natsort.index_humansorted(df["a"], reverse=True) # [1, 2, 0, 4, 3] gets the values in order x[1], x[2], x[0] etc.
for i, x in enumerate(indexes_human_sorted):
    df.loc[x, "c"] = i

    a       b  c
0   1 (2)   4   2.0
1   5 (3)   2   0.0
2   4 (2)   4   1.0
3   1 (1)   4   4.0
4   1 (2)   3   3.0

I believe that loop (which is slow) over the indexes should be unnecessary and I think I'm missing some trick from natsort.


Solution

  • you can use np.argsort I believe:

    df['c'] = np.argsort(natsort.index_humansorted(df['a'],reverse=True))
    

    print(df)
    
           a  b  c
    0  1 (2)  4  2
    1  5 (3)  2  0
    2  4 (2)  4  1
    3  1 (1)  4  4
    4  1 (2)  3  3