url = "https://www.lmcu.org/?__cf_chl_jschl_tk__=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs";
HttpURLConnection connection = pingHttpUrl(url);
responseCode = connection.getResponseCode();
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
logger.error("Caught exception : {}", e.getMessage());
throw new IOException();
}
return conn;
}
This gives response code as 503. But the site is properly loading on browser. What can be the issue with this ?
The problem is with the headers of the request. I found that this solution hosted on cloudflare requires two headers to be just so, otherwise you will receive the 503
response:
User-Agent
value: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
cf_clearance
needs to be set, and possibly the other set-cookie
values that are returned on the first request. This value has to do with the cloudflare support for privacy pass (https://blog.cloudflare.com/cloudflare-supports-privacy-pass/). It appears to be a means of verifying that a user is human, and not a machine. Which in turn is bad news for your efforts here.I have a working solution below, but it will be hard to automate - since it will require you to establish a browser session, and use the cookie set there in the code. Upon expiration of the cf_clearance cookie, you will have visit the site again and reset the cookie value in the code.
I would also speculate that the User-Agent header of the request, is used in generating the cf_clearance cookie that is required. Making it more difficult to hijack the cookie, as you would have to use a matching User-Agent of the browser used for the request when the cf_clearance cookie was generated by cloudflare.
I have journaled my investigation here:
When visiting the URL in my browser:
And inspecting the response that the server is giving, it turns out that it is infact giving back a 503
as well:
For some reson that I can't make out, the browser is redirected to the below URL instead. I cannot see that the location header is passed back in the response, or find this URL anywhere in the response.
I checked with Postman, and sure enough - I got the 503 error there as well. As far as I could tell, the server (or reverse proxy in front of it) was inspecting the headers of the request, and invalidating the request based on them. I fooled around a little, moving headers from the browser request into Postman, and finally figured out that it is a combination of the cookie
and User-Agent
headers being set that allows the request to be served.
The User-Agent
header is not allowed to have the specified chrome version, I have it working with version 83 here.
The cookkie
header is something that the browser will populate from my first visit to the site in the browser. So that is a bit harder to handle in your code. I tried to fetch it in code with connection.getHeaderField("set-cookie")
but that cookie does not seem to cut it.
But! I was able to make the code work, when taking the cookie
from my browser, and setting it manually in code, along with the User-Agent
:
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
// This one does not work for the reason of the chrome version apparently
// conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36");
conn.addRequestProperty("cookie", "<cookie value from the browser, from the header on a successful request>");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(String.format("Caught exception : %s", e.getMessage()));
throw new IOException();
}
return conn;
}
I later found out that is is the cookie value from the cf_clearance
key in the cookie that makes the difference.