Search code examples
phpffmpegopenai-api

Save FFMpeg conversion to PHP variable vs. File System for use with Whisper API?


I just started working on a little demo to transalte audio captured from the front-end as audio/webm using JS and then sent the back-end in a Laravel App. I guess there are JS libraries that can handle the conversion, but I'd rather use a server side solution with FFMPEG, which I am doing.

The backend code is below. It seems to be working after playing around with the PHP composer package that I'm using vs. one for Laravel that is also there. I'd rather use this one because I have other PHP apps that are not Laravel.

Questions:

  1. With the FFMpeg library, is there a way to capture the converted .mp3 file to a PHP variable in the script rather than saving it to the file system and then reading it back in later ?

  2. For the OpenAI call, I'd like to catch exceptions there also. I just sort of have a placeholder there for now.

    protected function whisper(Request $request) {
    
        $yourApiKey = getenv('OPENAI_API_KEY');
        $client = OpenAI::client($yourApiKey);
    
        $file = $request->file('file');
        $mimeType = $request->file('file')->getMimeType();
        $audioContents = $file->getContent();
    
        try {
    
            FFMpeg::open($file)
            ->export()
            ->toDisk('public')
            ->inFormat(new \FFMpeg\Format\Audio\Mp3)
            ->save('song_converted.mp3');
        }
        catch (EncodingException $exception) {
            $command = $exception->getCommand();
            $errorLog = $exception->getErrorOutput();
        }
    
        $mp3 = Storage::disk('public')->path('song_converted.mp3');
        try {
        $response = $client->audio()->transcribe([
        'model' => 'whisper-1',
        'file' =>  fopen($mp3, 'r'),
        'response_format' => 'verbose_json',
        ]);
        }
        catch (EncodingException $exception) {
            $command = $exception->getCommand();
            $errorLog = $exception->getErrorOutput();
        }
    
     echo json_encode($response);
    
    }
    

Solution

  • I don't think you can get the stream output of Whisper API directly into a variable. But I think what you meant is you want to:

    1. Have the stream response from Whisper stored in memory; or
    2. Stored in files that you don't have to manage.

    Luckily, the OpenAI client library seems to accept pointer resource (i.e. fopen return variable). The closest thing to that would be to use php://temp read-write stream. PHP would check if the size of it is bigger than 2MB (configurable), it will create a temp file for storage.

    The beauty if this is:

    1. If the stream is small, PHP would handle everything with memory.
    2. If it is large, you don't have to manage the resulting temp files yourself. PHP would remove the temp file after use.
    $mp3 = fopen('php://temp');
    $response = $client->audio()->transcribe([
        'model' => 'whisper-1',
        'file' =>  fopen($mp3, 'w+'),
        'response_format' => 'verbose_json',
    ]);
    

    Then you can rewind the $mp3 stream and read / stream out. For example:

    // Move the pointer back to the beginning of the temporary storage.
    rewind($mp3);
    
    // Directly copy the stream chunk-by-chunk to the
    // output buffer / output stream
    $output = fopen('php://output', 'w');
    stream_copy_to_stream($mp3, $output, 1024);
    

    With Laravel, you'd probably need something like this:

    rewind($mp3);
    return response()->stream(fn() => echo stream_get_contents($mp3));