Search code examples
unixaudioffmpegflac

Split audio file into several files, each below a size threshold


I have a FLAC file which I need to split into several distinct FLAC files, each of which must be below 100 MB in size. Are there any UNIX tools which can do this for me? Can I implement this logic myself?

Side-note: since FLAC is compressed, I figure that the easiest solution will require first converting the file to WAV.


Solution

  • There are two parts to your question.

    • Convert existing FLAC audio file to some other format like wav
    • Split converted wav file into chunk of specific size.

    Obviously, there are more than one way to do this. However, pydub provides easier methods to accomplish above. details on pydub documentation can be found here.

    1) Convert existing FLAC audio file to some other format like wav

    Using pydub you can read FLAC audio format and then convert to wav as below

    flac_audio = AudioSegment.from_file("sample.flac", "flac")
    flac_audio.export("audio.wav", format="wav")
    

    2) Split converted wav file into chunk of specific size.

    Again, there are various ways to do this. The way I did this was to determine total length and size of the converted wavfile and then approximate that to desired chunk size.

    The sample wav file used was of 101,612 KB size and about 589 sec or little over 9 minutes.

    Wav File size by observation :

    Stereo frame_rate 44.1KHz audio files are approximately 10 Mb per a minute. 48K would be a little larger.That means that the corresponding mono file would be 5 megs per minute

    The approximation holds good for our sample file with about10 Mb per minute

    Wav file size by math:

    Co relation between wav file size and duration is given by

    wav_file_size_in_bytes = (sample rate (44100) * bit rate (16-bit) * number of channels (2 for stereo) * number of seconds) / 8 (8 bits = 1 byte)

    Source : http://manual.audacityteam.org/o/man/digital_audio.html

    The formula I used to calculate chunks of audio file:

    Get chunk size by following method

    for duration_in_sec (X) we get wav_file_size (Y)
    So whats duration in sec (K) given file size of 10Mb

    This gives K = X * 10Mb / Y

    pydub.utils has method make_chunks that can make chunks of specific duration (in milliseconds). We determine duration for desired size using above formula.

    We use that to create chunks of 10Mb (or near 10Mb) and export each chunk separately. Last chunk may be smaller depending upon size.

    Here is a working code.

    from pydub import AudioSegment
    #from pydub.utils import mediainfo
    from pydub.utils import make_chunks
    import math
    
    flac_audio = AudioSegment.from_file("sample.flac", "flac")
    flac_audio.export("audio.wav", format="wav")
    myaudio = AudioSegment.from_file("audio.wav" , "wav")
    channel_count = myaudio.channels    #Get channels
    sample_width = myaudio.sample_width #Get sample width
    duration_in_sec = len(myaudio) / 1000#Length of audio in sec
    sample_rate = myaudio.frame_rate
    
    print "sample_width=", sample_width 
    print "channel_count=", channel_count
    print "duration_in_sec=", duration_in_sec 
    print "frame_rate=", sample_rate
    bit_rate =16  #assumption , you can extract from mediainfo("test.wav") dynamically
    
    
    wav_file_size = (sample_rate * bit_rate * channel_count * duration_in_sec) / 8
    print "wav_file_size = ",wav_file_size
    
    
    file_split_size = 10000000  # 10Mb OR 10, 000, 000 bytes
    total_chunks =  wav_file_size // file_split_size
    
    #Get chunk size by following method #There are more than one ofcourse
    #for  duration_in_sec (X) -->  wav_file_size (Y)
    #So   whats duration in sec  (K) --> for file size of 10Mb
    #  K = X * 10Mb / Y
    
    chunk_length_in_sec = math.ceil((duration_in_sec * 10000000 ) /wav_file_size)   #in sec
    chunk_length_ms = chunk_length_in_sec * 1000
    chunks = make_chunks(myaudio, chunk_length_ms)
    
    #Export all of the individual chunks as wav files
    
    for i, chunk in enumerate(chunks):
        chunk_name = "chunk{0}.wav".format(i)
        print "exporting", chunk_name
        chunk.export(chunk_name, format="wav")
    

    Output:

    Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> ================================ RESTART ================================
    >>> 
    sample_width= 2
    channel_count= 2
    duration_in_sec= 589
    frame_rate= 44100
    wav_file_size =  103899600
    exporting chunk0.wav
    exporting chunk1.wav
    exporting chunk2.wav
    exporting chunk3.wav
    exporting chunk4.wav
    exporting chunk5.wav
    exporting chunk6.wav
    exporting chunk7.wav
    exporting chunk8.wav
    exporting chunk9.wav
    exporting chunk10.wav
    >>>