Search code examples
pythonpandasdataframedata-sciencecategorization

Replacing values based on multiple column values and conditions in pandas dataframe


I have the following dataframe:

     TIME   PREC    ET   PET  YIELD       date  year  intensity
0        1   1.21  0.02  0.02   0.00 1991-01-01  1991          1
1        2   0.00  0.03  0.04   0.00 1991-01-02  1991          1
2        3   0.00  0.03  0.05   0.00 1991-01-03  1991          1
3        4   0.00  0.04  0.05   0.00 1991-01-04  1991          1
4        5   0.00  0.05  0.07   0.00 1991-01-05  1991          1
5        6   0.00  0.03  0.05   0.00 1991-01-06  1991          1
6        7   0.00  0.02  0.04   0.00 1991-01-07  1991          1
7        8   1.14  0.03  0.04   0.00 1991-01-08  1991          1
8        9   0.10  0.02  0.03   0.00 1991-01-09  1991          1
9       10   0.00  0.03  0.04   0.00 1991-01-10  1991          1
10      11   0.10  0.05  0.11   0.00 1991-01-11  1991          1
11      12   0.00  0.06  0.15   0.00 1991-01-12  1991          1
12      13   2.30  0.14  0.44   0.00 1991-01-13  1991          1
13      14   0.17  0.09  0.29   0.00 1991-01-14  1991          1
14      15   0.00  0.13  0.35   0.00 1991-01-15  1991          1
15      16   0.00  0.14  0.39   0.00 1991-01-16  1991          1
16      17   0.00  0.10  0.31   0.00 1991-01-17  1991          1
17      18   0.00  0.15  0.51   0.00 1991-01-18  1991          1
18      19   0.00  0.22  0.58   0.00 1991-01-19  1991          1
19      20   0.10  0.04  0.09   0.00 1991-01-20  1991          1
20      21   0.00  0.04  0.06   0.00 1991-01-21  1991          1
21      22   0.27  0.13  0.43   0.00 1991-01-22  1991          1
22      23   0.00  0.10  0.25   0.00 1991-01-23  1991          1
23      24   0.00  0.03  0.04   0.00 1991-01-24  1991          1
24      25   0.00  0.04  0.05   0.00 1991-01-25  1991          1
25      26   0.43  0.04  0.15   0.00 1991-01-26  1991          1
26      27   0.17  0.06  0.23   0.00 1991-01-27  1991          1
27      28   0.50  0.02  0.04   0.00 1991-01-28  1991          1
28      29   0.00  0.03  0.04   0.00 1991-01-29  1991          1
29      30   0.00  0.04  0.08   0.00 1991-01-30  1991          1
   ...    ...   ...   ...    ...        ...   ...        ...
7275   336   0.00  0.04  0.06   0.03 2010-12-02  2010          1
7276   337   0.30  0.05  0.08   0.02 2010-12-03  2010          1
7277   338   1.62  0.08  0.12   0.02 2010-12-04  2010          1
7278   339   0.00  0.10  0.15   0.02 2010-12-05  2010          1
7279   340   0.00  0.09  0.15   0.02 2010-12-06  2010          1
7280   341   0.00  0.04  0.06   0.02 2010-12-07  2010          1
7281   342   0.00  0.04  0.06   0.02 2010-12-08  2010          1
7282   343   0.00  0.16  0.25   0.02 2010-12-09  2010          1
7283   344   0.00  0.22  0.35   0.02 2010-12-10  2010          1
7284   345   0.04  0.04  0.06   0.02 2010-12-11  2010          1
7285   346   0.01  0.02  0.03   0.02 2010-12-12  2010          1
7286   347   0.04  0.02  0.03   0.02 2010-12-13  2010          1
7287   348   0.00  0.05  0.08   0.02 2010-12-14  2010          1
7288   349   0.37  0.08  0.13   0.02 2010-12-15  2010          1
7289   350  13.19  0.08  0.13   0.02 2010-12-16  2010          1
7290   351   1.33  0.06  0.10   0.02 2010-12-17  2010          1
7291   352   0.00  0.08  0.12   0.02 2010-12-18  2010          1
7292   353   0.01  0.05  0.08   0.02 2010-12-19  2010          1
7293   354   0.00  0.03  0.04   0.02 2010-12-20  2010          1
7294   355   9.70  0.07  0.12   0.02 2010-12-21  2010          1
7295   356   0.00  0.06  0.11   0.02 2010-12-22  2010          1
7296   357   0.00  0.07  0.12   0.02 2010-12-23  2010          1
7297   358   0.13  0.08  0.13   0.02 2010-12-24  2010          1
7298   359   0.10  0.09  0.15   0.02 2010-12-25  2010          1
7299   360   0.00  0.07  0.12   0.02 2010-12-26  2010          1
7300   361   0.00  0.08  0.14   0.02 2010-12-27  2010          1
7301   362   0.61  0.18  0.32   0.02 2010-12-28  2010          1
7302   363   0.00  0.15  0.26   0.02 2010-12-29  2010          1
7303   364   3.95  0.10  0.18   0.02 2010-12-30  2010          1
7304   365   0.10  0.10  0.18   0.01 2010-12-31  2010          1

I would like to categorize the data based on the values in PREC and year columns. Here's the template of the code that I have:

for y in range(1991,2011):
    for i in df.loc[df.year == y, 'PREC']:
        if i <= np.percentile(df.loc[df.year == y, 'PREC'], 10):
            df.loc[df.PREC.isin([i]), 'intensity'] = 'light'

However, this code categorize intensity in the whole dataframe, but I need to categorize PREC for each year. So, how to do this?


Solution

  • So, basically, I should have used & to combine two conditions:

    for y in range(1991,2011):
        for i in df.loc[df.year == y, 'PREC']:
            if i <= np.percentile(df.loc[df.year == y, 'PREC'], 10):
                df.loc[(df['year'] == y) & (df['PREC'] == i), 'intensity'] = 'light'