Im dealing with Java code and here it is:
public InputStream unzip(InputStream inputStream) throws IOException {
ZipInputStream zipIn = new ZipInputStream(inputStream);
zipIn.getNextEntry();
Scanner sc = new Scanner(zipIn);
StringBuilder sb = new StringBuilder();
while (sc.hasNextLine()) {
sb.append(sc.nextLine());
sb.append("\n");
}
System.out.println(sb);
zipIn.close();
InputStream is = fromStringBuffer(sb);
return (InputStream)is;
}
public static InputStream fromStringBuffer(StringBuilder sb) {
return new ByteArrayInputStream(sb.toString().getBytes());
}
While I am unzipping the file some Turkish characters get in a weird format (like Ü
becomes Ü
).
How can I have them to be written to StringBuilder correctly?
Streams (of the java.io
variety, as opposed to java.util.stream
) are for reading (or writing) bytes.
Scanner
deals with chars. If you pass an InputStream
to a Scanner
, you need to provide a charset; otherwise it uses the default charset.
But: this assumes that the byte stream passed to the Scanner
actually does represent a stream of chars, using some charset. A ZipInputStream
does not, necessarily: it's whatever the contents of the zipped file are. If you say there are characters missing, I presume your zipped file is text; but, from the perspective of reading from the zip file, it's just a stream of bytes.
If you want an InputStream
from a ZipInputStream
, simply return the ZipInputStream
.
If you want to interpret the returned stream as chars, of course you will still need to know the charset; but you just won't have introduced unnecessary round-tripping from bytes to chars to bytes here.
If you want all of the charset encoding to be handled inside this method, return a Reader
, the analogue of InputStream
that represents a stream of chars.
For example, you could return an InputStreamReader
, e.g. new InputStreamReader(zipIn, charset)
. This doesn't absolve you of the issues of knowing the correct charset; but it insulates callers of the method from having to deal with it instead.