I am using ELK stack v8.4.1 and trying to integrate data between SQL Server and Elasticsearch via Logstash. My source table includes Turkish characters (collation SQL_Latin1_General_CP1_CI_AS
). When Logstash writes these characters to Elasticsearch, it converts the Turkish characters to '?'
. For example 'Şükrü' => '??kr?'. (I used before ELK stack v7.* and didn't have that problem)
This is my config file:
input {
jdbc
{
jdbc_connection_string => "jdbc:sqlserver://my-sql-connection-info;encrypt=false;characterEncoding=utf8"
jdbc_user => "my_sql_user"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_driver_library => "my_path\mssql-jdbc-11.2.0.jre11.jar"
statement => [ "Select id,name,surname FROM ELK_Test" ]
schedule => "*/30 * * * * *"
}
stdin {
codec => plain { charset => "UTF-8"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test_index"
document_id => "%{id}"
user => "logstash_user"
password => "password"
}
stdout { codec => rubydebug }
}
I tried with and without filter to force encoding to UTF-8 but doesn't change.
filter {
ruby {
code => 'event.set("name", event.get("name").force_encoding(::Encoding::UTF_8))'
}
}
Below is my Elasticsearch result:
{
"_index": "test_index",
"_id": "2",
"_score": 1,
"_source": {
"name": "??kr?",
"@version": "1",
"id": 2,
"surname": "?e?meci",
"@timestamp": "2022-09-16T13:02:00.254013300Z"
}
}
BTW console output results are correct.
{
"name" => "Şükrü",
"@version" => "1",
"id" => 2,
"surname" => "Çeşmeci",
"@timestamp" => 2022-09-16T13:32:00.851877400Z
}
I tried to insert sample data from Kibana Dev Tool and the data was inserted without a problem. Does anybody help, please? What can be wrong? What can I check?
The solution is changing the JDK version. I changed the embedded OpenJDK with Oracle JDK-19 and the problem was solved.