Search code examples
pythonaudiowavmp4

Convert mp4 sound to text in python


I want to convert a sound recording from Facebook Messenger to text. Here is an example of an .mp4 file send using Facebook's API: https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833

So this file includes only audio (not video) and I want to convert it to text.

Moreover, I want to do it as fast as possible since I'll use the generated text in an almost real-time application (i.e. user sends the .mp4 file, the script translates it to text and shows it back).

I've found this example https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py and here is the code I use:

import requests
import speech_recognition as sr

url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)

with open("test.mp4", "wb") as handle:
    for data in r.iter_content():
        handle.write(data)

r = sr.Recognizer()
with sr.AudioFile('test.mp4') as source:
    audio = r.record(source)

command = r.recognize_google(audio)
print command

But I'm getting this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Asterios\Anaconda2\lib\site-packages\speech_recognition\__init__.py", line 200, in __enter__
    self.audio_reader = aifc.open(aiff_file, "rb")
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 952, in open
    return Aifc_read(f)
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 347, in __init__
    self.initfp(f)
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 298, in initfp
    chunk = Chunk(file)
  File "C:\Users\Asterios\Anaconda2\lib\chunk.py", line 63, in __init__
    raise EOFError
EOFError

Any ideas?

EDIT: I want to run the script on the free-plan of pythonanywhere.com, so I'm not sure how I can install tools like ffmpeg there.

EDIT 2: If you run the above script substituting the url with this one "http://www.wavsource.com/snds_2017-01-08_2348563217987237/people/men/about_time.wav" and change 'mp4' to 'wav', the it works fine. So it is for sure something with the file format.


Solution

  • Finally I found an solution. I'm posting it here in case it helps someone in the future.

    Fortunately, pythonanywhere.com comes with avconv pre-installed (avconv is similar to ffmpeg).

    So here is some code that works:

    import urllib2
    import speech_recognition as sr
    import subprocess
    import os
    
    url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
    mp4file = urllib2.urlopen(url)
    
    with open("test.mp4", "wb") as handle:
        handle.write(mp4file.read())
    
    cmdline = ['avconv',
               '-i',
               'test.mp4',
               '-vn',
               '-f',
               'wav',
               'test.wav']
    subprocess.call(cmdline)
    
    r = sr.Recognizer()
    with sr.AudioFile('test.wav') as source:
        audio = r.record(source)
    
    command = r.recognize_google(audio)
    print command
    
    os.remove("test.mp4")
    os.remove("test.wav")
    

    In the free plan, cdn.fbsbx.com was not on the white list of sites on pythonanywhere so I could not download the content with urllib2. I contacted them and they added the domain to the white list within 1-2 hours!

    So a huge thanks and congrats to them for the excellent service even though I'm using the free tier.