Search code examples
phphtmlcssscreen-scraping

Download web page with images and stylesheets and (optionally) E-mailing it


I need to make snapshots of web pages programmatically using PHP and get them into a HTML E-Mail.

I tried wget --page-requisites. It downloads everything all right, but it doesn't change the HTML page's source code to point to the downloaded files rather than the on-line originals. Also, that HTML is of course a long way from being displayed properly in a HTML E-Mail.

I am interested to know whether there are ready-made solutions for this. I would already be happy with a solution that takes a HTML snapshot and changes the HTML accordingly. Being able to E-Mail it would be the icing on the cake.

I control the web pages being snapshot, so I have the possibility to adjust the content to optimize the results.

My server-side platform is PHP but with very liberal settings, I can execute things like wget and Perl scripts from within PHP. I do however not have root access and can not install additional packages or programs.

The task is to make a snapshot of a product page each time somebody places an order, so there is documentation about what the page looked like at the time.


Solution

  • wget has a -k (--convert-links) option, which will convert both links and references to embedded content (like images). See e.g. wget advanced use (also here).

    For the email-part of your question - I'm sure you can use one of the existing libraries. For example, PHP has some PEAR package (do no remember the exact name) to handle HTML emails; I'm pretty sure both Perl and Python have something similar.