Search code examples
ruby-on-rails-5wgetrails-activestorageruby-on-rails-6

Use wget to make static copy of site with image redirects


I want to use wget to create a static copy of a dynamic site. The images hit a controller and are then redirected to their real (expiring) URL. I am struggling to find working command line options for wget. I tried

wget \
--page-requisites --convert-links --max-redirect=10 \
http://activestorage-test.herokuapp.com

but the image paths are not properly processed.

Is this doable and if so - how?

Background

The site was created using Ruby on Rails 6. The images on this site are uploaded using Active Storage and therefore the browser hits a rails controller and gets redirected when GETting them.

Example

If you can show me a way to use wget to get a static copy of

activestorage-test.herokuapp.com

including the photo my problem is solved. Random image names for the redirected images are ok and probably necessary.

The app may need a little time to spin up as it is on Heroku's free plan.


Solution

  • This is the combination of options that makes it work:

    wget -mk http://activestorage-test.herokuapp.com
    

    Explanations:

    (Taken from here)

    -m
    --mirror
        Turn on options suitable for mirroring.  This option turns on recursion and time-stamping, sets
        infinite recursion depth and keeps FTP directory listings.  It is currently equivalent to -r -N -l
        inf --no-remove-listing.
    
    -k
    --convert-links
        After the download is complete, convert the links in the document to make them suitable for local
        viewing.  This affects not only the visible hyperlinks, but any part of the document that links to
        external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content,
        etc.
        ...
    
    

    This will make the image stored as a plain text file in the rails active storage dir scheme, e.g.:

    ├── rails
    │   └── active_storage
    │       └── representations
    │           └── eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBLQT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--192401464645b8679e8fc4b8b8e7423923a4404b
    │               └── eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaDdCam9MY21
    

    This answer is based on @Davebra's comment but strips all options not related to the problem.