Extract individual speech acts from call transcript

I have call transcript data as follow:

'[0:00:00] spk1 : Hi how are you [0:00:02] spk2 : I am good, need help on my phone. 
[0:00:10] spk1 : sure, let me know the issue'

I want the text data for spk1 separated from spk2.

I tried this

import re

text = "[0:00:00] spk1 : Hi how are you [0:00:02] spk2 : I am good, need help on my phone. [0:00:10] spk1 : sure, let me know the issue"

m = re.search('\](.+?)\[', text)
if m:
    found = m.group
found

But I am not getting the answer.

Solution

Assuming you want to keep order, time, speaker information and allow for some relatively dynamic orders (flexible number of speakers, same speaker is allowed to speak in two timestamps or more in a row):

import re

text = "[0:00:00] spk1 : Hi how are you [0:00:02] spk2 : I am good, need help on my phone. [0:00:10] spk1 : sure, let me know the issue"

conversation_dict_list = []
# iterate over tokens split by whitespaces
for token in text.split(): 
    # timestamp: add new dict to list, add time and empty speaker and empty text 
    if re.fullmatch("\[\d+:\d\d:\d\d\]", token):
        conversation_dict_list.append({"time": token[1:-1], "speaker": None, "text": ""})
    # speaker: fill speaker field
    elif re.fullmatch("spk\d+", token):
        conversation_dict_list[-1]["speaker"] = token
    # text: keep concatenating to text field (plus whitespace)
    else:  
        conversation_dict_list[-1]["text"] += " " + token

# remove leading " : " from each text
conversation_dict_list = [{k_:(v_ if k_ != "text" else v_[3:]) for k_,v_ in d.items()} for d in conversation_dict_list]

print(conversation_dict_list)

Which returns:

> [{'time': '0:00:00', 'speaker': 'spk1', 'text': 'Hi how are you'}, {'time': '0:00:02', 'speaker': 'spk2', 'text': 'I am good, need help on my phone.'}, {'time': '0:00:10', 'speaker': 'spk1', 'text': 'sure, let me know the issue'}]

Obviously this will only work if you always have the exact pattern [h:mm:ss] spkX because if you have e.g. multiple speakers within the same timestamp the speaker would be overwritten with the last one.