I am trying to create a proxy in a Java application that allows me to modify some aspects of HTTP requests.
To do this, I open a ServerSocket on port 8080, configure a Proxy in Mozilla Firefox on that port and, for each connection, execute the accept() method of ServerSocket in a separate thread. So far everything normal.
To send requests from the browser to the corresponding website, I use the HttpClient library included in Java 11. This is the piece of code where I use that library:
private void obtainResponse(Socket socket, IHttpRequest req, String uri) {
HttpClient client = null;
if (req.isSSL()) {
SSLContext sslContext = ((SecureConnectionHandler)connHandler).createSSLContext( req.getHost() );
client = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(30))
.priority(1)
.version(HttpClient.Version.HTTP_2)
.followRedirects(Redirect.NORMAL)
.sslContext( sslContext )
.build();
}
else
client = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(30))
.priority(1)
.version(HttpClient.Version.HTTP_2)
.followRedirects(Redirect.NORMAL)
.build();
String protocolAndHost = ((req.isSSL()) ? "https://" : "http://") + req.getHost();
if (uri == null)
uri = protocolAndHost + req.getRequestedResource();
else {
if (uri.startsWith("/"))
uri = protocolAndHost + uri;
System.out.println("Aqui:" + uri);
}
HttpRequest.Builder preRequest=null;
if (req.getMethod().equalsIgnoreCase("GET")) {
preRequest = HttpRequest.newBuilder() // GET request!
.uri(URI.create( uri ))
.GET();
}
else if (req.getMethod().equalsIgnoreCase("POST")) {
preRequest = HttpRequest.newBuilder() // POST request!
.uri(URI.create( uri ))
.POST(BodyPublishers.ofString(req.getBody()));
}
for (Header header : req.getHeaders()) {
if (!header.getKey().equalsIgnoreCase("Host") &&
!header.getKey().equalsIgnoreCase("Connection") &&
!header.getKey().equalsIgnoreCase("Content-Length") &&
!header.getKey().equalsIgnoreCase("Upgrade") )
{
preRequest.setHeader(header.getKey(), header.getValues());
}
}
HttpRequest request = preRequest.build();
System.err.println("Request to: " + uri);
HttpResponse<byte[]> response;
try {
response = client.sendAsync(request, BodyHandlers.ofByteArray())
.join();
} catch (CompletionException ce) {
System.err.println("Address " + uri + " is unreachable!");
return ;
}
HttpHeaders httpHeaders = response.headers();
Optional<String> locationHeader = httpHeaders.firstValue("Location"); // When resource has been permanently moved
if ( !locationHeader.isEmpty() ) {
System.out.println("Moved permanently to " + locationHeader.get());
obtainResponse( socket, req, locationHeader.get() );
}
else {
Map<String, List<String>> headers = httpHeaders.map();
String protocol = response.version().toString().replace("_", ".").replaceFirst("\\.", "/");
int code = response.statusCode();
String reasonPhrase = HttpStatus.getStatusText( code );
var crlf = "\r\n";
var responseString = protocol + " " + code + " " + reasonPhrase + crlf;
for (String key : headers.keySet()) {
responseString += key + ":";
for (String valor : headers.get(key)) {
responseString += " " + valor;
}
responseString += crlf;
}
responseString += crlf; // espacio cabeceras y cuerpo
writeResponse(socket, response.body(), responseString);
}
}
private void writeResponse(Socket socket, byte[] streamResponse, String responseHeaders) {
OutputStream outputStream = null;
try {
outputStream = socket.getOutputStream();
outputStream.write(responseHeaders.getBytes());
outputStream.write(streamResponse);
outputStream.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
try {
if (!socket.isOutputShutdown()) {
socket.shutdownOutput();
}
outputStream.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
NOTE: IHttpRequest is a class created by me that contains all the information collected from the socket (target host, headers, body if it exists, etc.)
So far I have successfully intercepted all requests directed to HTTP websites. However, I have problems with, for example, https://www.google.com/, which implements the HTTP / 2 protocol and uses TLS. When I run the Java application and access the previous Google page, instead of showing me the website it shows me the following in the browser (ignore the blanks between headers):
HTTP/2 200 OK
:status: 200
alt-svc: quic = ":443"; ma = 2592000; v = "46,43,39"
cache-control: private
content-encoding: gzip
content-length: 46058
content-type: text/html; charset = UTF-8
date: Tue, 03 Sep 2019 09:47:34 GMT
Expires: Tue, 03 Sep 2019 09:47:34 GMT
p3p: CP = "This is not a P3P policy! See g.co/p3phelp for more info."
server: gws
set-cookie: 1P_JAR = 2019-09-03-09; expires = Thu, 03-Oct-2019 09:47:34 GMT; path = /; domain = .google.com; SameSite = none NID = 188 = XOJkffugf5G8rxNLov_iqqxo-Cq5RCvhwJPNu9tvtzLesZ4q8CE0IDVt9VgCEHZsw-AV0EYaaL8D4d_2Qwb6jXCcss7RydfV9PqQFemN_Ezz0kUjyseDDbJXfrHpmqPR6GIQCnR7bjukfasxg883K9fjnhAaqz6IpUYxoguZx-vazWc; expires = Wed, 04-Mar-2020 09:47:34 GMT; path = /; domain = .google.com; HttpOnly CONSENT = WP.27dd1a; expires = Fri, 01-Jan-2038 00:00:00 GMT; path = /; domain = .google.com
x-frame-options: SAMEORIGIN
x-xss-protection: 0
(↓↓BODY↓↓)
‹����� ÿÔ½ézâȲ (ú¿Ÿ‚ ¢ örÁ²À or PªÚ ° çyÜÞ¾ © $ Æ.ÞåžG¸ßýwþ® »™ š ¶« »× ùöíê ¢ ¤T '' '' ™ 'ß¿ († l / Æj ¢ or õ ßñ7¡ “QOLªV ÞU ¢ üø> Tm' ûÄ´T [L ^] îd * I7Õ Ùê Rçšb ÷ EE i²š¡ / ÜP iÃé0cÉDWE! ËsCò I ™ ZªI_ ‰) ## ™ '* & gš: ¦½ÒŽêê¸oŒ ‚™» † 9 $ vFQmU¶5c´Rˆ (Š © ZÖï 1L§ Üì ¦ÚUMS5W² Û $ # K'¶êæí FOWWrniCÒS- ò + Ú¨ · Åòõ¶ „÷ ñɲá 1 • 'ÙÐ
<< And much more information in form of bytes >>
Do you see anything strange in my code? I know that HTTP / 2 compresses the headers in Frames, but I assumed that HTTPClient did it internally...
If you need more information let me know :)
Thank you in advance.
The HTTP/2 protocol is a binary protocol. What you are sending back to your browser is an HTTP/1.1 response (even if the protocol used by the HttpClient to obtain the response was HTTP/2).
Yet - your status line is HTTP/2 200 OK
which your browser won't be able to understand. You need to send your response as a well formatted HTTP/1.1 response.
That includes filtering out headers whose keys start with ':' - like ':status' as these are HTTP/2 specific response headers. Also forwarding back all response headers without understanding them might simply not work: writing a full fledged HTTP Proxy is hard.