Search code examples
pandasdataframenumber-formattingpandas-groupby

Series of if statements applied to data frame


I have a question on how to this task. I want to return or group a series of numbers in my data frame, the numbers are from the column 'PD' which ranges from .001 to 1. What I want to do is to group those that are .91>'PD'>.9 to .91 (or return a value of .91), .92>'PD'>=.91 to .92, ..., 1>='PD' >=.99 to 1. onto a column named 'Grouping'. What I have been doing is manually doing each if statement then merging it with the base data frame. Can anyone please help me with a more efficient way of doing this? Still on the early stages of using python. Sorry if the question seems to be easy. Thank you for answering and for your time.


Solution

  • Let your data look like this

    >>> df = pd.DataFrame({'PD': np.arange(0.001, 1, 0.001), 'data': np.random.randint(10, size=999)})
    >>> df.head()
          PD  data
    0  0.001     6
    1  0.002     3
    2  0.003     5
    3  0.004     9
    4  0.005     7
    

    Then cut-off the last decimal of the PD column. This is a bit tricky since you get a lot of issues with rounding when doing it without str conversion. E.g.

    >>> df['PD'] = df['PD'].apply(lambda x: float('{:.3f}'.format(x)[:-1]))
    >>> df.tail()
           PD  data
    994  0.99     1
    995  0.99     3
    996  0.99     2
    997  0.99     1
    998  0.99     0
    

    Now you can use the pandas-groupby. Do with data whatever you want, e.g.

    >>> df.groupby('PD').agg(lambda x: ','.join(map(str, x)))
                         data
    PD                       
    0.00    6,3,5,9,7,3,6,8,4
    0.01  3,5,7,0,4,9,7,1,7,1
    0.02  0,0,9,1,5,4,1,6,7,3
    0.03  4,4,6,4,6,5,4,4,2,1
    0.04  8,3,1,4,6,5,0,6,0,5
    [...]
    

    Note that the first row is one item shorter due to missing 0.000 in my sample.