Search code examples
ruby-on-railsamazon-s3paperclipfog

Reprocessing large amount Paperclip styles


I have a decent amount of paperclip attachments(~270k, images) which I want to add another style to. These are all stored on S3 with fog. From initial testing and some back of the napkin calculations it seems like it would take about 2 weeks to do this which really isn't feasible.

rake paperclip:refresh:missing_styles

Feels like the obvious choice here, but it seems like it will try to download all styles for each attachment to figure out if it is in fact missing. Since I know that the new style is always missing this seems redundant.

So far I am thinking of splitting the workload over 10 or so workers

NUM_WORKERS = 10
PER_WORKER = (270_000 / NUM_WORKERS)

ranges = []
start = 1

NUM_WORKERS.times do 
  ranges << { start: start, batch: PER_WORKER }
  start += PER_WORKER
end

and running one rake task for each range using ActiveRecord Batch API.

So my questions are.

  1. Anyways to improve this and lessons from previous experiences
  2. If it's possible to skip generate only for the new styles. Maybe refresh:thumbnails with STYLE is a better approach

Thank you in advance

EDIT:

I ended writing a rake task that queues every attachment on a sidekiq low priority queue and a worker to dequeue and process these queued jobs. So far this is working well, it is not very fast, but it's out of my way and happening in the background in a satisfactory manner. This approach can also be parallelized easily by adding more instances of rails since they each come with their own set of Sidekiq workers


Solution

  • As per this guide you can manually reprocess only a certain style thus:

    my_model.an_attachment.reprocess!(:a_certain_style)
    

    You method of splitting the workload seems feasible.

    I remember seeing ads for a service which would process images by pulling and pushing straight from/to your S3 storage, maybe that would be the long-term solution rather than doing the heavy work yourself. Don't remember the name of the service though.