Search code examples
javautf-8windows-1252

Java convert Windows-1252 to UTF-8, some letters are wrong


I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis). The data is encoded as "Windows-1252".

I have tried to re-encode to UTF-8:

String textoFormado = ...value from MyBatis... ; 
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost the whole string is correctly decoded, but some letters with accents are not.

For example:

  1. I received this: �vila
  2. The code above makes: �?vila
  3. I expected: Ávila

Solution

  • I solved it thanks to all.

    I have the next project structure:

    • MyBatisQueries: I have a query with a "select" which gives me the String
    • Pojo to save the String (which gave me the String with conversion problems)
    • The class which uses the query and the Pojo object with data (that showed me bad decoded)

    at first I had (MyBatis and Spring inject dependencies and params):

    public class Pojo {
        private String params;
        public void setParams(String params) {
            try {
                this.params = params;
            }
        }
    
    }
    

    The solution:

    public class Pojo {
        private String params;
        public void setParams(byte[] params) {
            try {
                this.params = new String(params, "UTF-8");
            } catch (UnsupportedEncodingException e) {
                this.params = null;
            }
        }
    
    }