Search code examples
pythonpandasregexnlp

How to count specific keywords in a transcript with a condition


I got a big data frame with a "Transcript" column between an bot and a user. I need to count how many times in the transcript the user is asking for an agent/representative before giving the bot a chance.

The transcript looks as follow but longer:

"User : Order status.\nBot : Your order status is your orders tab. \nUser : representative."

"User : Agent please.\nBot : Waiting time is longer than usual."

I tried to use Regular Expression:

df["Transcript"] = df["Transcript"].str.lower()
df.loc[df["Transcript"].str.contains('agent|representative'),:]

But it will just output observations with those keywords. How can I output a number that count when user first input is agent/representative?


Solution

  • I'd do it by splitting the input to only the first transcript text (before the bot has even responded), then searching for your terms, and then summing the result to get the number of cases where a user has requested an agent within the first message:

    df['Transcript'].str.split('\n').str.get(0).str.contains('agent|representative').sum()
    
    # Output with your examples: 1