Python string to datetime - Yelp data

The Yelp dataset provides check-in information as strings:

Business_id	Date
A	2010-04-22 05:31:33, 2010-05-09 18:24:50,...
B	2010-03-07 02:04:38, 2010-04-11 01:45:57,2014-05-02 18:40:35, 2014-05-06 17:59:33,...

I want to calculate the daily number of check-in for each business.

Solution

Let suppose your data in a text file or a CSV file:

Sample Data

A,"2010-03-07 02:04:38,2010-04-11 01:45:57,2014-05-02 18:40:35,2014-05-06 17:59:33,2021-07-06 08:02:15,2021-07-06 10:01:18"
B,"2010-03-07 02:04:38,2010-04-11 01:45:57,2014-05-02 18:40:35,2014-05-06 17:59:33,2014-05-07 18:02:33,2021-07-06 08:05:15,2021-07-06 10:01:20"
C,"2010-03-07 02:04:38,2010-04-11 01:45:57,2014-05-02 18:40:35,2014-05-06 17:59:33,2014-05-08 16:05:20,2014-05-08 17:06:10,2021-07-06 10:01:19,2021-07-06 08:02:30,2021-07-06 10:01:20,2021-07-06 10:01:28"

You could read the data into a Dataframe and attempt the following:

df = pd.read_csv(r"/dir/filepath/filename.txt", header=None, delimiter=',')
df.columns = ["B_id", "Date"]

# explode converts the list into separate rows

df = df.assign(Date= df.Date.str.split(',')).explode("Date")
df["Date"] = pd.to_datetime(df["Date"])
print(df)
today = datetime.today().date()
today_df = df[df["Date"].dt.date == today]
grouped_df = today_df.groupby("B_id")["Date"].count()
grouped_df.head()

The Output after .explode():

    B_id    Date
0   A   2010-03-07 02:04:38
0   A   2010-04-11 01:45:57
0   A   2014-05-02 18:40:35
0   A   2014-05-06 17:59:33
0   A   2021-07-06 08:02:15
0   A   2021-07-06 10:01:18
1   B   2010-03-07 02:04:38
1   B   2010-04-11 01:45:57
1   B   2014-05-02 18:40:35
1   B   2014-05-06 17:59:33
1   B   2014-05-07 18:02:33
1   B   2021-07-06 08:05:15
1   B   2021-07-06 10:01:20
2   C   2010-03-07 02:04:38
2   C   2010-04-11 01:45:57
2   C   2014-05-02 18:40:35
2   C   2014-05-06 17:59:33
2   C   2014-05-08 16:05:20
2   C   2014-05-08 17:06:10

The final output:

B_id  
A     2
B     2
C     4