My site has a large number of graphs which are recalculated each day as new data is available. The graphs are stored on Amazon S3 using active_storage
. A typical example would be
# app/models/graph.rb
class Graph < ApplicationRecord
has_one_attached :plot
end
and in the view
<%= image_tag graphs.latest.plot %>
where graphs.latest
retrieves the latest graph. Each day, a new graph and attached plot is created and the old graph/plot is deleted.
A number of bots, including from Google and Yandex are indexing the graphs, but then are generating exceptions when the bot returns and accesses the image again at urls like
www.myapp.com/rails/active_storage/representations/somelonghash
Is there a way to produce a durable link for the plot that does not expire when the graph/plot is deleted and then recalculated. Failing this, is there a way to block bots from accessing these plots.
Note that I currently have a catchall at the end of the routes.rb
file:
get '*all', to: 'application#route_not_found', constraints: lambda { |req|
req.path.exclude? 'rails/active_storage'
} if Rails.env.production?
The exclusion of active storage in the catchall is in response to this issue. It is tempting to remove the active_storage exemption, but this might then stop proper active_storage routes.
Maybe I can put something in rack_rewrite.rb
to fix this?
Interesting question.
A simple solution would be to use the send_data
functionality to send the image directly. However, that can have it's own issues, mostly in terms of probably increasing server bandwidth usage (and reducing server performance). However, a solution like that is what you need if you don't want to go through the trouble of the below as far as creating a redirect model goes and the logic around that.
Original Answer
The redirect will require setting up some sort of Redirects::Graph
model. That basically can verify that a graph was deleted and redirect to the new graph instead of the requested one. It would have two fields, a old_signed_id
(biglonghash
) and a new_signed_id
.
Every time you delete
We'll need to populate the redirects model and also add a new entry every time a new graph is created (we should be able to generate the signed_id
from the blob somehow).
For performance, and to avoid tons of redirects in a row which may result in a different error/issue. You'll have to manage the request change. IE: Say you now have a redirect A => B
, you delete B
replacing it with C
now you need a A => C
and B => C
(to avoid an A => B => C
redirect), this chain could get rather long indeed. This can be handled efficiently by just adding the new signed_id => new_id
index and doing a Redirects::Graph.where(new_signed_id: old_signed_id).update_all(new_signed_id: new_signed_id)
to update all the relevant old redirects, whenever you re-generate the graph.
The controller itself is trickier, cleanest method I can think of is to monkey patch the ActiveStorage::RepresentationsController
to add a before_action
that does something like this (may not work as-is, the params[:signed_id] and the representations path may not be right):
before_action :redirect_if_needed
def redirect_if_needed
redirect_model = Redirects::Graph.find_by(old_signed_id: params[:signed_id])
redirect_to rails_activestorage_representations_path(
signed_id: redirect_model.new_signed_id
) if redirect_model.present?
end
If you have version control setup for your database (IE: Papertrail gem or something) you may be able to work out the old_signed_id
and new_signed_id
with a bit of work and build the redirects for the urls currently causing errors. Otherwise, sadly this approach will only prevent future errors, and it may be impossible to get the current broken urls working.
Ideally, though you would update the blob itself to use the new graph instead of the old graph rather than deleting, but not sure that's possible/practical.