I'm trying to read the source code from a browser, but when the code has characters like ã, á, à, õ, I get � instead.
I've tried to apply java.nio.Charset.encode
on read lines, but no result: the same thing occurs.
My code is:
URLConnection connection = ...;
BufferedReader reader = new BufferedReader(connection.getInputStream());
String s = null;
while ((s = reader.readLine()) != null) {
// got new source line...
}
The site I'm trying to read is this one (PT-BR).
According to the meta tag, the charset on that page is ISO-8859-1. Try using:
Scanner scanner = new Scanner(connection.getInputStream(), "ISO-8859-1");