I have a simple data frame with IDs and date, like below:
'ID Date
a 2009/12/1
c 2009/12/1
d 2009/12/1
a 2010/4/1
c 2010/5/1
e 2010/5/1
b 2010/12/1
b 2012/3/1
e 2012/7/1
b 2013/1/1
...
...'
I need to count unique values by each month and accumulate them but not counting existing IDs. For instance
`2009/12/1 3
2010/4/1 3
2010/5/1 4
... ...`
I created a loop but not working
`for d in df['date'].drop_duplicates():
c=df[df['date']<=d].ID.nunique()
df2=DataFrame(data=c,index=d)`
Can anyone tell me where is the problem? thanks
One option is to write a for loop and use a set to hold the cumulative unique IDs:
cumcount = []
cumunique = set()
date = []
for k, g in df.groupby(pd.to_datetime(df.Date)):
cumunique |= set(g.ID) # hold cumulative unique IDs
date.append(g.Date.iat[0]) # get the date variable for each group
cumcount.append(len(cumunique)) # hold cumulative count of unique IDs
pd.DataFrame({"Date": date, "ID": cumcount})