java http-redirect http-status-code-301 google-http-client

Google HTTP Client Library for Java throws HttpResponseException: 301 Moved Permanently

I have a problem with Google HTTP Client Library for Java (1.22.0). This is my code

String url = "http://gazetapraca.pl/ogl/2502758";
GenericUrl genericUrl = new GenericUrl(url);
ApacheHttpTransport apacheHttpTransport = new ApacheHttpTransport();
HttpRequest httpRequest = apacheHttpTransport.createRequestFactory().buildGetRequest(genericUrl);
httpRequest.setFollowRedirects(true);
HttpResponse httpResponse = httpRequest.execute();

and httpRequest.execute() throws

     com.google.api.client.http.HttpResponseException: 301 Moved Permanently

Below is follow from Wireshark

GET /ogl/2502758 HTTP/1.1
Accept-Encoding: gzip
User-Agent: Google-HTTP-Java-Client/1.22.0 (gzip)
Host: gazetapraca.pl
Connection: Keep-Alive

HTTP/1.1 301 Moved Permanently
Date: Sat, 26 Nov 2016 22:15:52 GMT
Server: Apache
Location: /ogl/2502758/pakowacz+-+mile+widziane+panie
Content-Length: 0
Set-Cookie: JSESSIONID_JOBS=2f1TffY6JYcb6zvBSrQ72fds7rfdsSnHM3sefw6D31Lfr434bnkDmdLQJLvLFZ6zkYBF!-12116034235597; path=/; HttpOnly
Content-Language: pl
P3P: CP="NOI DSP COR NID PSAo OUR IND"
Vary: User-Agent
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive

GET /ogl/2502758/pakowacz%20-%20mile%20widziane%20panie HTTP/1.1
Accept-Encoding: gzip
User-Agent: Google-HTTP-Java-Client/1.22.0 (gzip)
Host: gazetapraca.pl
Connection: Keep-Alive
Cookie: JSESSIONID_JOBS=2f1TffY6JYcb6zvBSrQ72fds7rfdsSnHM3sefw6D31Lfr434bnkDmdLQJLvLFZ6zkYBF!-12116034235597

HTTP/1.1 301 Moved Permanently
Date: Sat, 26 Nov 2016 22:15:52 GMT
Server: Apache
Location: /ogl/2502758/pakowacz+-+mile+widziane+panie
Content-Length: 0
Content-Language: pl
P3P: CP="NOI DSP COR NID PSAo OUR IND"
Vary: User-Agent
Keep-Alive: timeout=2, max=99
Connection: Keep-Alive

and repeat a few times. Maybe the problem is with url, because location is /ogl/2502758/pakowacz+-+mile+widziane+panie and next request method get is /ogl/2502758/pakowacz%20-%20mile%20widziane%20panie. In other software and library everything is working (google chrome browser, postman - addon to chrome, JSOUP - java library).

Does anyone have an idea how to solve the problem?

Solution

This is not your library's fault.

To understand why this problem is occurring, we must first understand the "error" message associated with your problem:

com.google.api.client.http.HttpResponseException: 301 Moved Permanently

So, what does this mean? Well, the last part of the error message, the description says "301 Moved Permanently". What that is referring to is an HTTP Status Code. An HTTP Status Code indicates what the outcome of a specific request is. In this case, the status code was 301, which according to RFC protocol means:

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs.

So, this means that the URL that you are using is no longer valid, and that you have to use the new URL given to you by the Location response header. Now, it seems that the library that you're using is smart enough to detect this, and initializes a new request to the new URL. That's great and all, but your library that you are using, is incorrectly escaping the url provided by the Location header, and using that for the new request (turning /ogl/2502758/pakowacz+-+mile+widziane+panie into /ogl/2502758/pakowacz%20-%20mile%20widziane%20panie), and the server receiving this request recognizes that those to paths are not the same (even though they should be. So, the server sends another 301 response, telling the client (the library in this case) to use the un-escaped URL instead of the escaped one, even though they should be the same.

Now, why is your library doing this? It turns out that, according to RFC spec, the '+' character is reserved for URIs. That means that that character, along with other characters are only intended for use in URIs for their intended purpose. Therefore, it is not standard to include the '+' character in URIs, unless it is used for a very specific purpose, which it looks like is not the case.

So, this all means that you cannot blame the library for this error, you can only blame the people who developed this site.

The reason that this works in your browser and other places is because those clients do not seem to be escaping the requested URL for you before sending it to the server.