I want to use wget to create a static copy of a dynamic site. The images hit a controller and are then redirected to their real (expiring) URL. I am struggling to find working command line options for wget
. I tried
wget \
--page-requisites --convert-links --max-redirect=10 \
http://activestorage-test.herokuapp.com
but the image paths are not properly processed.
Is this doable and if so - how?
The site was created using Ruby on Rails 6. The images on this site are uploaded using Active Storage and therefore the browser hits a rails controller and gets redirected when GETting them.
If you can show me a way to use wget
to get a static copy of
activestorage-test.herokuapp.com
including the photo my problem is solved. Random image names for the redirected images are ok and probably necessary.
The app may need a little time to spin up as it is on Heroku's free plan.
This is the combination of options that makes it work:
wget -mk http://activestorage-test.herokuapp.com
(Taken from here)
-m
--mirror
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets
infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l
inf --no-remove-listing.
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local
viewing. This affects not only the visible hyperlinks, but any part of the document that links to
external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content,
etc.
...
This will make the image stored as a plain text file in the rails active storage dir scheme, e.g.:
├── rails
│ └── active_storage
│ └── representations
│ └── eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBLQT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--192401464645b8679e8fc4b8b8e7423923a4404b
│ └── eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaDdCam9MY21
This answer is based on @Davebra's comment but strips all options not related to the problem.