Search code examples
ruby-on-railsrails-activestorageopenai-apirails-activejobopenai-whisper

Action Job can't seem to find an Action Storage attachment with error: Errno::ENOENT (No such file or directory @ rb_sysopen


I have a Rails 7.0.4 app in which I have a simple TranscriptionController that once a form with the title, description and audio_file are submitted, an Action Job is kicked off to process an audio transcription using the openai gem. I keep getting this error in spite of the URL being produced as pointing to the audio_file I want transcoded: Errno::ENOENT (No such file or directory @ rb_sysopen

Here is the Transcription model:

    class Transcription < ApplicationRecord
    has_one_attached :audio_file

    def audio_file_url
        Rails.application.routes.url_helpers.url_for(audio_file) if audio_file.attached?
    end 
end

Here is the create action in the TranscriptionController:

def create
    @transcription = Transcription.new(transcription_params)

    respond_to do |format|
      if @transcription.save
        # Process audio file using OpenAI's Whisper API
        TranscriptionJob.perform_later(@transcription)

        format.html { redirect_to transcription_url(@transcription), notice: "Transcription was successfully created. Check back later for transcription text." }
        format.json { render :show, status: :created, location: @transcription }
        
      else
        format.html { render :new, status: :unprocessable_entity }
        format.json { render json: @transcription.errors, status: :unprocessable_entity }
      end
    end
  end

And here is the most important part, the Active Job called transcription_job.rb:

class TranscriptionJob < ApplicationJob
  queue_as :default
  require 'openai'

  def perform(transcription)
    # Do something later
    client = OpenAI::Client.new(access_token:  Rails.application.credentials.dig(:openai, :api_key))
    logger.info "SMC DEBUG: OpenAI Key set"

    file_path = transcription.audio_file_url

    logger.info "SMC DEBUG: Audio file path is: #{file_path}"
    
    response = client.transcribe(
        parameters: {
            model: "whisper-1",
            file: File.open(file_path, "rb")
        })
    logger.info "SMC DEBUG: transcription sent"
    # Update the transcription object with the returned text
    transcription.transcriptionresult = response['text']
    logger.info "SMC DEBUG: TranscriptionResult is: #{transcription.transcriptionresult}"

        # Save the updated transcription object
    transcription.save
  end
end

Here is some relevant feedback from the Rails logs for the Action Job:

15:21:58 web.1  | [ActiveJob] Enqueued ActiveStorage::AnalyzeJob (Job ID: 7b3600ec-ee85-4be6-8faa-18f467fc719a) to Async(default) with arguments: #<GlobalID:0x00000001073c6880 @uri=#<URI::GID gid://transcriptionservice/ActiveStorage::Blob/59>>
15:21:58 web.1  | [ActiveJob] Enqueued TranscriptionJob (Job ID: 974566f7-7d83-48ee-b3ae-d6f02a006efb) to Async(default) with arguments: #<GlobalID:0x00000001073d7ba8 @uri=#<URI::GID gid://transcriptionservice/Transcription/61>>
15:21:58 web.1  | Redirected to http://localhost:3000/transcriptions/61
15:21:58 web.1  | Completed 302 Found in 60ms (ActiveRecord: 13.1ms | Allocations: 24625)
15:21:58 web.1  | 
15:21:58 web.1  | 
15:21:58 web.1  | Started GET "/transcriptions/61" for ::1 at 2023-05-14 15:21:58 -0400
15:21:58 web.1  | [ActiveJob] [ActiveStorage::AnalyzeJob] [7b3600ec-ee85-4be6-8faa-18f467fc719a]   ActiveStorage::Blob Load (2.2ms)  SELECT "active_storage_blobs".* FROM "active_storage_blobs" WHERE "active_storage_blobs"."id" = $1 LIMIT $2  [["id", 59], ["LIMIT", 1]]
15:21:58 web.1  | [ActiveJob] [ActiveStorage::AnalyzeJob] [7b3600ec-ee85-4be6-8faa-18f467fc719a] Performing ActiveStorage::AnalyzeJob (Job ID: 7b3600ec-ee85-4be6-8faa-18f467fc719a) from Async(default) enqueued at 2023-05-14T19:21:58Z with arguments: #<GlobalID:0x0000000107426ac8 @uri=#<URI::GID gid://transcriptionservice/ActiveStorage::Blob/59>>
15:21:58 web.1  | [ActiveJob] [ActiveStorage::AnalyzeJob] [7b3600ec-ee85-4be6-8faa-18f467fc719a]   Disk Storage (0.6ms) Downloaded file from key: 49zqmrx6g54pk1pzuc3y2uyvtiyx
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb]   Transcription Load (0.3ms)  SELECT "transcriptions".* FROM "transcriptions" WHERE "transcriptions"."id" = $1 LIMIT $2  [["id", 61], ["LIMIT", 1]]
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb] Performing TranscriptionJob (Job ID: 974566f7-7d83-48ee-b3ae-d6f02a006efb) from Async(default) enqueued at 2023-05-14T19:21:58Z with arguments: #<GlobalID:0x000000010743ca80 @uri=#<URI::GID gid://transcriptionservice/Transcription/61>>
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb] SMC DEBUG: OpenAI Key set
15:21:58 web.1  | Processing by TranscriptionsController#show as TURBO_STREAM
15:21:58 web.1  |   Parameters: {"id"=>"61"}
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb]   ActiveStorage::Attachment Load (7.0ms)  SELECT "active_storage_attachments".* FROM "active_storage_attachments" WHERE "active_storage_attachments"."record_id" = $1 AND "active_storage_attachments"."record_type" = $2 AND "active_storage_attachments"."name" = $3 LIMIT $4  [["record_id", 61], ["record_type", "Transcription"], ["name", "audio_file"], ["LIMIT", 1]]
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb]   ↳ app/models/transcription.rb:5:in `audio_file_url'
15:21:58 web.1  |   Transcription Load (1.7ms)  SELECT "transcriptions".* FROM "transcriptions" WHERE "transcriptions"."id" = $1 LIMIT $2  [["id", 61], ["LIMIT", 1]]
15:21:58 web.1  |   ↳ app/controllers/transcriptions_controller.rb:83:in `set_transcription'
15:21:58 web.1  |   Rendering layout layouts/application.html.erb
15:21:58 web.1  |   Rendering transcriptions/show.html.erb within layouts/application
15:21:58 web.1  |   Rendered transcriptions/_transcription.html.erb (Duration: 0.0ms | Allocations: 25)
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb]   ActiveStorage::Blob Load (1.4ms)  SELECT "active_storage_blobs".* FROM "active_storage_blobs" WHERE "active_storage_blobs"."id" = $1 LIMIT $2  [["id", 59], ["LIMIT", 1]]
15:21:58 web.1  |   Rendered transcriptions/show.html.erb within layouts/application (Duration: 1.1ms | Allocations: 843)
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb]   ↳ app/models/transcription.rb:5:in `audio_file_url'
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb] SMC DEBUG: Audio file path is: http://localhost:3000/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBRQT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--495ff1b4e7c2001a5ca50886bfd85e2ad6847c7b/test_audio.mp3
15:21:58 web.1  | [ActiveJob] [TranscriptionJob] [974566f7-7d83-48ee-b3ae-d6f02a006efb] Error performing TranscriptionJob (Job ID: 974566f7-7d83-48ee-b3ae-d6f02a006efb) from Async(default) in 25.03ms: Errno::ENOENT (No such file or directory @ rb_sysopen - http://localhost:3000/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBRQT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--495ff1b4e7c2001a5ca50886bfd85e2ad6847c7b/test_audio.mp3):

When I try to put that url in my browser it downloads the correct file.

I have tried to use the following to reference the path instead of the URL:

file_path = Rails.application.routes.url_helpers.rails_blob_path(transcription.audio_file, only_path: true)

But this gives me a similar error pointing to the path instead of the url:

15:38:35 web.1  | [ActiveJob] [TranscriptionJob] [854d3393-ed0b-41b7-8f5f-7d6469feb8a1] Error performing TranscriptionJob (Job ID: 854d3393-ed0b-41b7-8f5f-7d6469feb8a1) from Async(default) in 134.93ms: Errno::ENOENT (No such file or directory @ rb_sysopen - /rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBRUT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--f6270379699bf876802dd28de4cb4414cc7f96ba/test_audio.mp3):

I am at a loss for how to address this problem. I would love to find an answer from the brain trust.


Solution

  • Thank you for @Chiperific pointing me to this edge guide: https://edgeguides.rubyonrails.org/active_storage_overview.html#downloading-files This helped me to rewrite my code as follows. The key is to download the file to a temporary file location on the system and then point to that local path in the openai API.

    class TranscriptionJob < ApplicationJob
      queue_as :default
      require 'openai'
    
      def perform(transcription)
        # Provide the openai API key and initialize an instance of the openai client
        client = OpenAI::Client.new(access_token:  Rails.application.credentials.dig(:openai, :api_key))
    
        # Initialize the transcriptionresult variable
        transcription.transcriptionresult = ''
        
        # It is necessary to download the file to temporary storage location first and then submit with the openai API.
    
        transcription.audio_file.open do |file|
          # I created a folder in the app root /tmp called “transcribe”
          system '/tmp/transcribe', file.path
          
          response = client.transcribe(
            parameters: {
                model: "whisper-1",
                file: File.open(file.path, "rb")
            })
    
          # Make the transcriptionresult equal to the response['text'] produced by the openai transcription.
          transcription.transcriptionresult = response['text']
          
          # Save the updated transcription object
          transcription.save
        end
      end
    end