Using a standard Java HTTP Client, I load a page at this address: https://www.youtube.com/watch?v=ELArlE7gSmw
Title of this youtube video in Bulgarian. It is listed in the meta tags of the page like this:
<meta name="title" content="here is title">
I am using the following code to load this page. Pay attention to the encoding (Windows-1251):
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.charset.Charset;
public class Application {
public static void main(String[] args) throws Exception {
HttpRequest request = HttpRequest.newBuilder()
.uri(new URI("https://www.youtube.com/watch?v=ELArlE7gSmw"))
.GET()
.build();
HttpClient client = HttpClient.newHttpClient();
HttpResponse<String> response = client.send(request,
HttpResponse.BodyHandlers.ofString(Charset.forName("Windows-1251")));
System.out.println(response.body());
}
}
If you look at this tag in the response (with Windows-1251 encoding), then it will look like this:
<meta name="title" content="ЗАХАРОСАН�? ЧЕРВЕН�? ЯБЪЛК�?!!">
If you use UTF-8 instead of Windows-1251, it will be like this:
<meta name="title" content="���������� ������� ������!!">
I have also tried other http clients, such as the client in jsoup library. The result is similar, although the demo version of this library displays all tags with Bulgarian content correctly:
How to decode http response without errors?
Solved the problem.
For Intellij IDEA: File > Settings > Editor > File Encodings.
Set fields "Global Encoding" and "Project Encoding" to "System Default" (not UTF-8 or Windows-1251, but default!). The whole output is fixed