I am starting to work with time series. I have one of a user doing bank transfers to different countries, however the most frequent country to where he/she is doing the transfers is X, but there are transfers also to the countries Y and Z. Let's say:
date id country
2020-01-01T00:00:00.000Z id_01 X
2020-01-01T00:20:00.000Z id_02 X
2020-01-01T00:25:00.000Z id_03 Y
2020-01-01T00:35:00.000Z id_04 X
2020-01-01T00:45:00.000Z id_05 Z
2020-01-01T01:00:00.000Z id_06 X
2020-01-01T10:20:00.000Z id_07 X
2020-01-01T10:25:00.000Z id_08 X
2020-01-01T13:00:00.000Z id_09 X
2020-01-01T18:45:00.000Z id_10 Z
2020-01-01T18:55:00.000Z id_11 X
Since the most frequent country is X, I would like to count iteratively how many transactions have been done within one hour (in the whole list of events) to countries different than X.
The format of the expected output for this particular case would be:
date id country
2020-01-01T00:25:00.000Z id_03 Y
2020-01-01T00:45:00.000Z id_05 Z
Starting from 2020-01-01T00:00:00.000Z
, within one hour there are two Y, Z transactions. Then starting from 2020-01-01T00:20:00.000Z
, within one hour, there are the same transactions, and so on. Then, starting from 2020-01-01T10:20:00.000Z
, within one hour, all are X. Starting from 2020-01-01T18:45:00.000Z
, within one hour, there is only one Z.
I am trying with a double for loop and .value_counts(), but I'm not sure of what I am doing.
IIUC, you can select only the rows not X, then use diff
once forward and once backward (within 1 hour before and after) and you want where any of the two diff is below a Timedelta
of 1h.
#convert to datetime
df['date'] = pd.to_datetime(df['date'])
#mask not X and select only these rows
mX = df['country'].ne('X')
df_ = df[mX].copy()
# mask within an hour before and after
m1H = (df_['date'].diff().le(pd.Timedelta(hours=1)) |
df_['date'].diff(-1).le(pd.Timedelta(hours=1)) )
# selet only the rows meeting criteria on X and 1H
df_ = df_[m1H]
print (df_)
date id country
2 2020-01-01 00:25:00+00:00 id_03 Y
4 2020-01-01 00:45:00+00:00 id_05 Z