Search code examples
pythonpandasdataframedata-analysis

count consecutive occurrences by condition in pandas


I have the following dataframe:

data = {'A': [0,0,0,1,1,1,0,1], 'B': [0,1,1,1,1,1,1,1], 'C': [1,0,1,0,1,1,1,0]}
df=pd.DataFrame(data)
df=df.transpose()
columns={'0':'population'}
df=df.rename(index=str, columns={0: "20062", 1: "20063", 2: "20064", 3: "20071", 4: "20072", 5: "20073", 6: "20074", 7: "20081"})


Out[135]: 
   20062  20063  20064  20071  20072  20073  20074  20081
A      0      0      0      1      1      1      0      1
B      0      1      1      1      1      1      1      1
C      1      0      1      0      1      1      1      0

My main task is to find the number of ``disappearances"

A 'Disapperance' let us defined to be the case when 0 is followed after 1

So, the expected outcome in this example is A disappears only once in 20074, B disappearance zero times, while C disappears 3 times (in 20063,20071, and 20081 respectively)

I want to do the following:

  1. total number of disappearances by time (the columns in this example, so in 20063 there was one disappearance, again 1 in 20072 etc)
  2. by type: A disappeared once in 20074, C diseappered 3 times in 20063, 20071 and 20081
  3. total number of disappearances (here 4)

Can someone help how I can do this in python.

My dataframe is quite large, so I would ideally look for a general solution.

Thanks


Solution

  • You can use diff and sum across axis=None to get total disappearances

    >>> df.diff(axis=1).eq(-1).values.sum(axis=None)
    4
    

    To get per row, sum across axis=1

    df.diff(axis=1).eq(-1).sum(axis=1)
    
    A    1
    B    0
    C    3
    dtype: int64
    

    To get per time, sum across axis=0

    df.diff(axis=1).eq(-1).sum(axis=0)
    
    20062    0
    20063    1
    20064    0
    20071    1
    20072    0
    20073    0
    20074    1
    20081    1
    dtype: int64