Search code examples
ruby-on-railsamazon-s3carrierwavedelayed-job

Rails, Amazon S3 storage, CarrierWave-Direct and delayed_job - is this right?


I've just discovered that Heroku doesn't have long-term file storage so I need to move to using S3 or similar. A lot of new bits and pieces to get my head around so have I understood how direct upload to S3 using CarrierWave-direct and then processing by delayed_job should work with my Rails app?

What I think should happen if I code this correctly is the following:

  1. I sign up to an S3 account, set-up my bucket(s) and get the authentication details etc that I will need to program in (suitably hidden from my users)
  2. I make sure that direct upload white lists don't stop cross-domain from preventing my uploads (and later downloads)
  3. I use CarrierWave & CarrierWave-direct (or similar) to create my uploads to avoid loading up my app during uploads
  4. S3 will create random access ('filename') information so I don't need to worry about multiple users uploading files with the same name and the files getting overwritten; if I care about the original names I can use metadata to store them.
  5. CarrierWave-direct redirects the users browser to an 'upload completed' URL after the upload from where I can either create the delayed_job or popup the 'sorry, it went wrong' notification.
  6. At this point the user knows that the job will be attempted and they move on to other stuff.
  7. My delayed_job task accesses the file using the S3 APIs and can delete the input file when completed.
  8. delayed_job completes and notifies the user in the usual way e.g. an e-mail.

Is that it or am I missing something? Thanks.


Solution

  • You have a good understanding of the process you need. To throw one more layer of complexity at you---you should wrap all of it in rails new(er) ActiveJob. ActiveJob simply facilities background processing inside rails via the processor of your choosing (in your case DelayedJobs). Then, you can create Jobs via a rails generator:

     bin/rails g job process_this_thing
    

    Active Jobs offers a few "rails way" of handling jobs...but, it also allows you to switch processors with less hassle.

    So, you create a carrierwave uploader (see carrierwave docs). Then, attach that uploader to a model. For carrierwave_direct you need to disassociate the file field from your models form and move the file field to its own form (use the form url method provided by carrierwave-direct).

    You can choose to upload the file, then save the record. Or, save the record and then process the file. The set-up process is significantly different depending on which you choose.

    Carrierwave and carrierwave-direct know where to save the file based on the fog credentials you put in the carrierwave initializer and by using the store_dir path, if set, in the uploader.

    Carrierwave provides the uploader, which define versions, etc. Carrierwave_direct facilities uploading direct to your S3 bucket and processing versions in the background. Active Jobs, via DelayedJobs, provides the background processing. Fog is the link between carrierwave and your S3 bucket.

    You should add a boolean flag to your model that is set to true when carrierwave_direct uploads your image and then set to false when the job finishing processing the versions. That way, instead of a broken link (while the job is running and not yet complete) your view will show something like 'this thing is still processing...'.

    RailsCast is the perfect resource for completing this task. Check this out: https://www.youtube.com/watch?v=5MJ55_bu_jM