Search code examples
pythonamazon-web-servicesaudiofull-text-searchspeech-to-text

How can I search content in an audio file?


I have an audio file, I used AWS transcribe to get the text from the audio. I now have a json file containing the transcript. The json file also contains the start time and end time of every word. For example :

enter image description here

I am wondering how can I search for a complete sentence and have returned the time it has been said ? I am using python to do this.

Thank you for your help.


Solution

  • I would extract all words and time in lists and look for the occurence of the sentence. Something like this, if I got your data format correctly (always using the first alternative as the extracted word):

    def extract_words_and_time(data):
        word_list = []
        time_list = []
        for item in data['items']:
            word_list.append(item['alternatives'][0]['content'].lower())
            time_list.append((item['start_time'], item['end_time']))
        return word_list, time_list
    
    def get_sub_list_index(sub_list, complete_list):
        sublist_length = len(sub_list)
        for ind in (i for i, element in enumerate(complete_list) if element == sub_list[0]):
            if complete_list[ind:ind + sublist_length] == sub_list:
                return ind, ind + sublist_length - 1
    
    def get_start_and_end_time(sentence, word_list):
        matching_start_stop = get_sub_list_index(sentence.lower().split(), word_list)
        if matching_start_stop:
            start_time = time_list[matching_start_stop[0]][0]
            end_time = time_list[matching_start_stop[1]][1]
            return start_time, end_time
    
    word_list, time_list = extract_words_and_time(your_data_from_json)
    sentence = 'Bonjour mon petit chien'
    sentence_timing = get_start_and_end_time(sentence, word_list)
    
    if sentence_timing:
        print(f'Start: {sentence_timing[0]}, Stop: {sentence_timing[1]}')
    else:
        print('Sentence was not found')
    

    Cannot really test it, in theory it should work ;)