python pandas pandas-groupby sklearn-pandas

Pandas - Counting rows in a df to discover the survival rate each day

Hello, guys!

I have a dfA (Table A) containing the amount of days that some products have been available (days_survived). I need to count the number of products that were available each day in total (Table B). I mean, I need counting rows in dfA to discover the survival rate each day for the first 5 days (df2).

Table A:

+-------+--------------+
| id    | days_survived|
+-------+--------------+
| 1     |  1           |
| 2     |  3           |
| 3     |  10          | 
| 4     |  40          |
| 5     |  4           |
| 6     |  9           |
+-------+--------------+

Table B (Expected results analyzing the first 5 days):

+-------+----------------+
| day   | #count_survived|
+-------+----------------+
| 1     |  6             |
| 2     |  5             |
| 3     |  5             | 
| 4     |  4             |
| 5     |  3             |
+-------+----------------+

This result means that on the first day 6 products in total were available, then only 5 on the second and third day, then only 4 on the fourth day and finally only 3 on the fifth day.

code:

# create df
import pandas as pd
d = {'id': [1,2,3,4,5,6], 'days_survived': [1,3,10,40,4,9]}
dfA = pd.DataFrame(data=d)

Could anyone help me, please? :)

Solution

Use list comprehension with flattening and filtering and then count:

comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
s = pd.Series(comp).value_counts().rename_axis('day').reset_index(name='#count_survived')
print (s)
   day  #count_survived
0    1                6
1    3                5
2    2                5
3    4                4
4    5                3

Another solution with Counter:

from collections import Counter

comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
d = Counter(comp)
df = pd.DataFrame({'day':list(d.keys()), '#count_survived':list(d.values())})