Search code examples
pythonpandasseriesboolean-indexing

Insert a customized series as a new column in a DataFrame with Pandas


Given this DataFrame with columns: category, Year, and Profit

enter image description here

data = {'category':pd.Series(['A','A','A','A','A','A']),
        'Year':pd.Series([1,1,3,3,3,4]),
        'Profit':pd.Series([10,11,5,6,30,31])}
df = pd.DataFrame(data)
display(df)

how can I insert numbers creating a new column Numbering by the following rule without manually enter the numbers one-by-one:

  1. Insert 0 for the corresponding Year < 3.
  2. Insert 1 at the first cell with Year >= 3.
  3. After that insert a geometric series with a common ratio of 0.5 for the corresponding Year >= 3.

The desire output is displayed as follows:

enter image description here


Solution

  • We can try cumsum

    s = (0.5**(df.Year.ge(3).cumsum()-1)).mask(df.Year<3,0)
    Out[15]: 
    0    0.000
    1    0.000
    2    1.000
    3    0.500
    4    0.250
    5    0.125
    Name: Year, dtype: float64
    df['numbering'] = s