I'm trying to read the following API text page:
using InputStreamReader and I want to extract the text and print it line by line.
The issue is that the format of the text is not recognized as UTF-8. So the output looks ugly like: ????
The code of the method is the following:
String testURL = "https://api.stackexchange.com/2.2/users?page=1&pagesize=9&fromdate=1221436800&todate=1523318400&order=desc&min=1&max=2000000&sort=reputation&site=stackoverflow";
URL url = null;
try
{
url = new URL(testURL);
} catch (MalformedURLException e1)
{
e1.printStackTrace();
}
InputStream is = null;
try
{
is = url.openStream();
} catch (IOException e1)
{
e1.printStackTrace();
}
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO-8859-1")))
{
String line;
while ((line = br.readLine()) != null)
{
System.out.println(line);
}
} catch (MalformedURLException e)
{
e.printStackTrace();
} catch (IOException e)
{
e.printStackTrace();
}
I've tried changing the line
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8")))
to
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8)))
or to
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO-8859-1")))
Unfortunately, the issue still persists. I would really appreciate any tips so I can solve this problem. Thank you.
To analyse your problem I tried to download from the given URL by curl
(with option -i
to see the HTTP response header lines) and got:
Cache-Control: private
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Credentials: false
X-Content-Type-Options: nosniff
Date: Sat, 21 Apr 2018 21:48:42 GMT
Content-Length: 85
▒VJ-*▒/▒▒LQ▒210ЁrsS▒▒▒S▒▒▒▒3KR2▒▒R
K3▒RS▒`J▒sA▒I▒)▒▒E@NIj▒R-g▒▒PP^C
The line Content-Encoding: gzip
tells you that the content is gzip-compressed.
Hence, in your Java program you need to gzip-uncompress the contents.
You can do this simply by replacing the line
is = url.openStream();
with
is = new GZIPInputStream(url.openStream());
An even better approach would be to get the actual Content-Encoding and depending on that decide if you need to decompress the content:
URLConnection connection = url.openConnection();
is = connection.getInputStream();
String contentEncoding = connection.getContentEncoding();
if (contentEncoding.equals("gzip"))
is = new GZIPInputStream(is);