(This is a continuation of this question, in which I solved one problem and found another).
I am fetching a HTTPS website using wget via a WireMock proxy. Here is my fetch command pointing to a demo secure site:
wget -e use_proxy=yes -e https_proxy=localhost:8100 \
https://www.rottentomatoes.com/
Here is my proxy set-up:
java -jar wiremock-standalone-2.5.1.jar \
--port 8081 --https-port 8100 \
--proxy-all https://www.rottentomatoes.com/ \
--record-mappings \
--root-dir ./proxy-cache \
--verbose
The WireMock on-screen logs say this:
2017-03-27 12:08:09.066 Verbose logging enabled
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2017-03-27 12:08:09.827 Verbose logging enabled
2017-03-27 12:08:09.892 Recording mappings to ./proxy-cache/mappings
/$$ /$$ /$$ /$$ /$$ /$$
| $$ /$ | $$|__/ | $$$ /$$$ | $$
| $$ /$$$| $$ /$$ /$$$$$$ /$$$$$$ | $$$$ /$$$$ /$$$$$$ /$$$$$$$| $$ /$$
| $$/$$ $$ $$| $$ /$$__ $$ /$$__ $$| $$ $$/$$ $$ /$$__ $$ /$$_____/| $$ /$$/
| $$$$_ $$$$| $$| $$ \__/| $$$$$$$$| $$ $$$| $$| $$ \ $$| $$ | $$$$$$/
| $$$/ \ $$$| $$| $$ | $$_____/| $$\ $ | $$| $$ | $$| $$ | $$_ $$
| $$/ \ $$| $$| $$ | $$$$$$$| $$ \/ | $$| $$$$$$/| $$$$$$$| $$ \ $$
|__/ \__/|__/|__/ \_______/|__/ |__/ \______/ \_______/|__/ \__/
port: 8081
https-port: 8100
https-keystore: jar:file:(removed)/wiremock-standalone-2.5.1.jar!/keystore
proxy-all: https://www.rottentomatoes.com/
preserve-host-header: false
enable-browser-proxying: false
record-mappings: true
match-headers: []
no-request-journal: false
verbose: true
The result is:
--2017-03-27 12:08:25-- https://www.rottentomatoes.com/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:8100... connected.
Failed reading proxy response: Success
Retrying.
--2017-03-27 12:08:26-- (try: 2) https://www.rottentomatoes.com/
Connecting to localhost (localhost)|127.0.0.1|:8100... connected.
Failed reading proxy response: Success
Retrying.
^C
As you can see, the fetch fails, automatically retries, and needs to be cancelled to end.
I have tried --preserve-host-header
in the WireMock command (standalone docs here) but the result is the same.
I wonder if the proxy is failing internally because it needs to be pointed to a valid HTTPS cert store? That said, I would expect the running proxy to be outputting something (even errors) but it is appearing not to be handling the call at all. An equivalent HTTP call works fine.
Is there something I can do to see why Wget is failing? The error message is not very helpful, and as far as I know I cannot make it more verbose (verbose is on by default in wget).
This behaviour is the same across Alpine 3.4 (in a Docker container) and running on my Ubuntu 14.04 VM. It's also the same across WireMock 2.4.1, 2.5.0 and 2.5.1.
I have tried switching my browser's (Firefox) HTTPS proxy settings to point to WireMock, and it fails on the website I'm trying to fetch due to a bad certificate. Interestingly WireMock does not output anything to stdout, even though it looks like Firefox has contacted the remote server.
I wondered if the built-in keystore in Wiremock was out of date, or empty, so learning how to specify a "real" cert store was probably the next thing worth trying. I used these instructions to convert a browser cert file to JKS format, and this made no difference, either to Wget or Firefox.
I note that my newly created keystore is 955 bytes, whereas the original PEM cert file is ~260K, so it is clear that not all certs have been added (maybe it just added the first one?). FWIW I used this command:
keytool -import -v -trustcacerts -alias endeca-ca \
-file cacert.pem -keystore truststore.ks
I have added the -verbose
and -verbose:jni
switches to the java
call, to prove that something is happening when the HTTPS proxy is required. There is a veritable essay printed out when I run the Wget command, so I am confident the HTTPS proxy is being hit. Wget also fetches fine in HTTP mode.
I am at the stage where I could try all sorts of things blindly, and I think I need to get some intelligence from the Java system to see why it is failing first. My guess is that Wiremock is the problem rather than Wget.
I have found an undocumented feature in Wiremock, --print-all-network-traffic
, which offers this:
2017-03-27 17:36:51.287 Opened Socket[addr=/127.0.0.1,port=54140,localport=8100]
2017-03-27 17:36:51.397 Incoming bytes: CONNECT www.rottentomatoes.com:443 HTTP/1.1
User-Agent: Wget/1.15 (linux-gnu)
Host: www.rottentomatoes.com:443
2017-03-27 17:36:51.398 Closed Socket[addr=/127.0.0.1,port=54140,localport=8100]
2017-03-27 17:36:51.399 Closed Socket[addr=/127.0.0.1,port=54140,localport=8100]
2017-03-27 17:36:52.400 Opened Socket[addr=/127.0.0.1,port=54142,localport=8100]
2017-03-27 17:36:52.483 Incoming bytes: CONNECT www.rottentomatoes.com:443 HTTP/1.1
User-Agent: Wget/1.15 (linux-gnu)
Host: www.rottentomatoes.com:443
The second section repeats for as often as wget retries, but there is still nothing much useful here. I want to know why it is failing.
Are there logging parameters I can add to java -jar
, or is there is system-wide error log for Java I can consult? I have installed VisualVM, but the various outputs do not seem very relevant. I expect I would be most interested in exceptions?
I wrote my own proxy in PHP, and seeing the behaviour of wget
from the proxy side based on whether the target was HTTP or HTTPS, revealed my misunderstanding here.
Basically, an HTTP client will forward (plaintext) HTTP requests to a proxy using standard methods (such as GET
or POST
), and these can be captured by the proxy if it wishes (e.g. for playback purposes). This is what WireMock and other similar tools will do.
However, if an HTTP client fetches an HTTPS target via a proxy, it seems it is required to use the CONNECT
method, and then the proxy will then act as a traffic exchanger between the two sides - it effectively marshals the swap of encrypted data, and cannot decode it.
Thus, the likely explanation here is that WireMock does not bother to handle this verb, since it cannot record the data anyway.
The one area that confuses me is why WireMock offers a --https-port
if it is not able to record data passing through this port anyway. I will update this post if I discover the answer.