Search code examples
pythonnumpyaudiosegment

how to convert numpy array into pydub.AudioSegment


I have a TTS model and I want to combine audio.

I need a way to convert the model output(numpy array) for pydub.AudioSegment to be able to combine audio

This is the model output -

 audio[0].data.cpu().numpy() = array([ 1.90522405e-04,  3.96589050e-04,  4.41852462e-04, ...,
        1.13033675e-05, -1.63643017e-05, -2.01268449e-05], dtype=float32)

This is my function to combine the audio

from pydub import AudioSegment
from os.path import exists
def creating_one_audio_file(audio):
  if exists("/content/audio_file.wav"):
    sound2 = AudioSegment.from_wav("/content/audio_file.wav")
    combined_sounds = audio + sound2
    combined_sounds.export("/content/audio_file.wav", format="wav")
  else:
    combined_sounds = audio
    combined_sounds.export("/content/audio_file.wav", format="wav")

creating_one_audio_file(audio[0].data.cpu().numpy())

Solution

  • You can rely on audiosegment (a wrapper of a pydub.AudioSegment) and its audiosegment.from_numpy_array method or borrow its underlying method implementation from https://github.com/MaxStrange/AudioSegment/blob/master/docs/api/audiosegment.py#L1145