I'm doing a ZipInputStream request on a UTF-8 encoded zip file.
I get the data through OK, but special German characters are coming out wrong.
Using this page ( http://kellykjones.tripod.com/webtools/ascii_utf8_table.html ) I can see that my code is printing out the two individual chars from the UTF8 encoding column.
i.e. ä is UTF 0xC3,0xA4, and I am getting ä printed out (which are the 0xC3 and 0xA4 chars). Does anyone have any tips?
private InputStream downloadCsv(final String countryCode) {
final String url = baseUrl + countryCode.toUpperCase() + ".zip";
final String fileName = countryCode.toUpperCase() + ".txt";
BufferedInputStream in = null;
ZipInputStream zIn = null;
try {
in = new BufferedInputStream(new URL(url).openStream());
zIn = new ZipInputStream(in, Charset.forName("UTF-8"));
ZipEntry zipEntry;
while ((zipEntry = zIn.getNextEntry()) != null) {
if (zipEntry.getName().equals(fileName)) {
StringBuilder sb = new StringBuilder();
int c;
while((c = zIn.read()) != -1) {
sb.append((char)c);
System.out.println((char)c + " : " + c);
}
return new ByteArrayInputStream(sb.toString().getBytes());
}
}
...
more code
...
For the record, I fixed this using @saka1029s advice, using an InputStreamReader
, and would mark it as the accepted answer if I could!
I can't promise my code is the cleanest, but it works now:
BufferedInputStream in = null;
ZipInputStream zIn = null;
InputStreamReader zInReader = null;
try {
in = new BufferedInputStream(new URL(url).openStream());
zIn = new ZipInputStream(in);
ZipEntry zipEntry;
while ((zipEntry = zIn.getNextEntry()) != null) {
if (zipEntry.getName().equals(fileName)) {
StringBuilder sb = new StringBuilder();
zInReader = new InputStreamReader(zIn);
int c;
while((c = zInReader.read()) != -1) {
sb.append((char)c);
}
return new ByteArrayInputStream(sb.toString().getBytes());
}
}