Search code examples
pythonpandaspoint-clouds

How to maintain certain rows with conditions on a sample with pandas dataframe functions


I have a dataframe like this:

  SeqNumber X  Y  Z
0  12        4  5  5
1  12        7  5  -8
2  13        10 2  1
3  16        4  8  7 
...     

I would like to identify the corresponding SeqNumbers to a positive Z value in a sample between a X_min, X_max and Y_min, Y_max and then just keep those SeqNumbers on the whole dataframe. How can I do that by using .loc?

If I define x_min = 3, x_max = 8, y_min = 4 and y_max = 6. Only the first 2 lines would be selected. Then of those lines, just the first one has a positive Z. So to end my problem I would like to maintain all the rows with the SeqNumber of the first line (the one who was selected before). With that the code would result with a dataframe with the first 2 lines of the original


Solution

  • Compute x_min, x_max, y_min, y_max with agg and search rows that match your conditions:

    x_min = 3
    x_max = 8
    y_min = 4
    y_max = 6
    
    idx = df.loc[df['Z'].gt(0) & df['X'].between(x_min, y_max)
                                & df['Y'].between(y_min, y_max),
                 'SeqNumber'].values
    
    out = df.loc[df['SeqNumber'].isin(idx)]
    

    Output:

    >>> idx
    array([12])
    
    >>> out
       SeqNumber  X  Y  Z
    0         12  4  5  5
    1         12  7  5 -8