Search code examples
pythonpandasdefault-valueidentity-column

Define column values to be selected / disselected as default


I would like to automate selecting of values in one column - Step_ID. Insted of defining which Step_ID i would like to filter (shown in the code below) i would like to define, that the first Step_ID and the last Step_ID are being to excluded.

df = df.set_index(['Step_ID'])

df.loc[df.index.isin(['Step_2','Step_3','Step_4','Step_5','Step_6','Step_7','Step_8','Step_9','Step_10','Step_11','Step_12','Step_13','Step_14','Step_15','Step_16','Step_17','Step_18','Step_19','Step_20','Step_21','Step_22','Step_23','Step_24'])]

Is there any option to exclude the first and last value in the column? In this example Step_1 and Step_25. Or include all values expect of the first and the last value? In this example Step_2-Step_24.

The reason for this is that files have different numbers of ''Step_ID''. Since I don't have to redefine it all the time I would like to have a solution that simplify filtering of those. It is necessary to exclude the first and last value in the column 'Step_ID', but the number of the STEP_IDs is always different.

By Step_1 - Step_X, I need to have Step_2 - Step_(X-1).


Solution

  • Use:

    df = pd.DataFrame({
        'Step_ID': ['Step_1','Step_1','Step_2','Step_2','Step_3','Step_4','Step_5',
                    'Step_6','Step_6'],
        'B': list(range(9))})
    print (df)
      Step_ID  B
    0  Step_1  0
    1  Step_1  1
    2  Step_2  2
    3  Step_2  3
    4  Step_3  4
    5  Step_4  5
    6  Step_5  6
    7  Step_6  7
    8  Step_6  8
    

    Select all index values without first and last index values extracted by slicing df.index[[0, -1]]:

    df = df.set_index(['Step_ID'])
    df = df.loc[~df.index.isin(df.index[[0, -1]].tolist())]
    print (df)
             B
    Step_ID   
    Step_2   2
    Step_2   3
    Step_3   4
    Step_4   5
    Step_5   6