Search code examples
node.jsaws-lambdaamazon-cloudfront

AWS Cloudfront + lambda@edge modify html content (making all links absolute -> relative)


I (maybe falsely) assumed lambda@edge can modify origin.responce content, so wrote a lambda function like this:

/* this does not work. response.Body is not defined */

'use strict';
exports.handler = (event, context, callback) => {
  var response = event.Records[0].cf.response;
  var data = response.Body.replace(/OLDTEXT/g, 'NEWTEXT');
  response.Body = data;
  callback(null, response);
};

Which fails because you can not reference origin responce body with this syntax.

Can I modify this script to make it work as I intended, or maybe should I consider using another service on AWS?

My background :

We are trying to set up an AWS Cloudfront distribution, that consolidates access to several websites, like this:

ttp://foo.com/ -> https:/newsite.com/foo/
ttp://bar.com/ -> https:/newsite.com/bar/
ttp://boo.com/ -> https:/newsite.com/boo/

the sites are currently managed by external parties. We want to disable direct public access to foo/bar/boo, and have just newsite.com as the only site visible on the internet.

Mapping the origins into a single c-f distribution is relatively simple. however doing so will break html contents that specify files with an absolute url, if their current domain names are removed from the web.

ttp://foo.com/images/1.jpg
 -> (disable foo.com dns)
  -> image not found

to benefit from cloudfront caching and other merits, I want to modify/rewrite all absolute file references in html files to a relative url -
so

<img src="ttp://foo.com/images/1.jpg">

becomes

<img src="/foo/images/1.jpg">

//(accessed as https:/newsite.com/foo/images/1.jpg from a user)
//(maybe I should make it an absolte url for SEO purpose)

(http is changed to ttp, due to restriction of using the banned domain name foo.com)

(edit) I found this AWS blog, which may be a great hint but feel a little too convoluted to my expectation. (set up a linux container so I can just use sed to process html files, maybe using S3 as a temp storage) Hope I can find a simpler way: https://aws.amazon.com/blogs/networking-and-content-delivery/resizing-images-with-amazon-cloudfront-lambdaedge-aws-cdn-blog/


Solution

  • From what I have just learnt myself you unfortunately cannot modify the response body within a Lambda@edge. You can only wipe out or totally replace the body content. I was hoping to be able to clean all responses from a legacy site, but using a Cloudfront Lambda@Edge will not allow this to be done.

    As the AWS documentation states here :

    When you’re working with the HTTP response, Lambda@Edge does not expose the body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty. If you don’t update the body field in your function, the original body returned by the origin server is returned back to viewer.