Search code examples
pythonbatch-fileaudiosplit

Split audio files using silence detection


I've more than 200 MP3 files and I need to split each one of them by using silence detection. I tried Audacity and WavePad but they do not have batch processes and it's very slow to make them one by one.

The scenario is as follows:

  • split track whereas silence 2 seconds or more
  • then add 0.5 s at the start and the end of these tracks and save them as .mp3
  • BitRate 192 stereo
  • normalize volume to be sure that all files are the same volume and quality

I tried FFmpeg but no success.


Solution

  • I found pydub to be easiest tool to do this kind of audio manipulation in simple ways and with compact code.

    You can install pydub with

    pip install pydub
    

    You may need to install ffmpeg/avlib if needed. See this link for more details.

    Here is a snippet that does what you asked. Some of the parameters such as silence_threshold and target_dBFS may need some tuning to match your requirements. Overall, I was able to split mp3 files, although I had to try different values for silence_threshold.

    Snippet

    # Import the AudioSegment class for processing audio and the 
    # split_on_silence function for separating out silent chunks.
    from pydub import AudioSegment
    from pydub.silence import split_on_silence
    
    # Define a function to normalize a chunk to a target amplitude.
    def match_target_amplitude(aChunk, target_dBFS):
        ''' Normalize given audio chunk '''
        change_in_dBFS = target_dBFS - aChunk.dBFS
        return aChunk.apply_gain(change_in_dBFS)
    
    # Load your audio.
    song = AudioSegment.from_mp3("your_audio.mp3")
    
    # Split track where the silence is 2 seconds or more and get chunks using 
    # the imported function.
    chunks = split_on_silence (
        # Use the loaded audio.
        song, 
        # Specify that a silent chunk must be at least 2 seconds or 2000 ms long.
        min_silence_len = 2000,
        # Consider a chunk silent if it's quieter than -16 dBFS.
        # (You may want to adjust this parameter.)
        silence_thresh = -16
    )
    
    # Process each chunk with your parameters
    for i, chunk in enumerate(chunks):
        # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
        silence_chunk = AudioSegment.silent(duration=500)
    
        # Add the padding chunk to beginning and end of the entire chunk.
        audio_chunk = silence_chunk + chunk + silence_chunk
    
        # Normalize the entire chunk.
        normalized_chunk = match_target_amplitude(audio_chunk, -20.0)
    
        # Export the audio chunk with new bitrate.
        print("Exporting chunk{0}.mp3.".format(i))
        normalized_chunk.export(
            ".//chunk{0}.mp3".format(i),
            bitrate = "192k",
            format = "mp3"
        )
    

    If your original audio is stereo (2-channel), your chunks will also be stereo. You can check the original audio like this:

    >>> song.channels
    2