Search code examples
httpwebbrowser-extension

Simulating a remote website locally for testing


I am developing a browser extension. The extension works on external websites we have no control over.

I would like to be able to test the extension. One of the major problems I'm facing is displaying a website 'as-is' locally.

Is it possible to display a website 'as-is' locally?

I want to be able to serve the website exactly as-is locally for testing. This means I want to simulate the exact same HTTP data, including iframe ads, etc.

  • Is there an easy way to do this?

More info:

I'd like my system to act as closely to the remote website as possible. I'd like to run command fetch for example which would allow me to go to the site in my browser (without the internet on) and get the exact same thing I would otherwise (including information that is not from a single domain, google ads, etc).

I don't mind using a virtual machine if this helps.

I figured this was quite a useful thing in testing. Especially when I have a bug I need to reliably reproduce in sites that have many random factors (what ads show, etc).


Solution

  • As was already mentioned, caching proxies should do the trick for you (BTW, this is the simplest solution). There are quite a lot of different implementations, so you just need to spend some time selecting a proper one (according to my experience squid is a good solution). Anyway, I would like to highlight two other interesting options:

    Option 1: Betamax

    Betamax is a tool for mocking external HTTP resources such as web services and REST APIs in your tests. The project was inspired by the VCR library for Ruby. Betamax aims to solve these problems by intercepting HTTP connections initiated by your application and replaying previously recorded responses.

    Betamax comes in two flavors. The first is an HTTP and HTTPS proxy that can intercept traffic made in any way that respects Java’s http.proxyHost and http.proxyPort system properties. The second is a simple wrapper for Apache HttpClient.

    BTW, Betamax has a very interesting feature for you:

    Betamax is a testing tool and not a spec-compliant HTTP proxy. It ignores any and all headers that would normally be used to prevent a proxy caching or storing HTTP traffic.

    Option 2: Wireshark and replay proxy

    Grab all traffic you are interested in using Wireshark and replay it. This I would say it is not that hard to implement required replaying tool, but you can use available solution called replayproxy

    Replayproxy parses HTTP streams from .pcap files opens a TCP socket on port 3128 and listens as a HTTP proxy using the extracted HTTP responses as a cache while refusing all requests for unknown URLs.

    Such approach provide you with the full control and bit-to-bit precise simulation.