Search code examples
javaunicodealfrescodecode

Decode alfresco file name or replace unicode[_x0020_] characters in String/fileName


I am using alfresco download upload services using java.

When I upload the file to alfreco server it gives me the following path :

/app:Home/cm:Company_x0020_Home/cm:Abc/cm:TestFile/cm:V4/cm:BC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf

When I use the same file path and download using alfresco services I took the file name at the end of the path

i.e    ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf

How can I remove or decode the [Unicode] characters in fileName

String decoded = URLDecoder.decode(queryString, "UTF-8");

The above does not work .

These are some Unicode characters which appeared in my file name. https://en.wikipedia.org/wiki/List_of_Unicode_characters

Please do not mark the question as duplicate as I have searched below links but non of those gave the solution. Following are the links that I have searched for replacing unicode charectors in String with java.

Java removing unicode characters

Remove non-ASCII characters from String in Java

How can I replace a unicode character in java string

Java Replace Unicode Characters in a String


Solution

  • The solution given by Jeff Potts will be perfect . But i had a situation where i was using file name in diffrent project where i wont use org.alfresco related jars

    I had to take all those dependencies to use for a simple file decoding So i used java native methods which uses regex to parse the file name and decode it,which gave me the perfect solution which was same from using

    ISO9075.decode(test);
    

    This is the code which can be used

     public String decode_FileName(String fileName) {
            System.out.println("fileName : " + fileName);
            String decodedfileName = fileName;
            String temp = "";
            Matcher m = Pattern.compile("\\_x(.*?)\\_").matcher(decodedfileName); //rejex which matches _x0020_ kind of charectors
            List<String> unicodeChars = new ArrayList<String>();
            while (m.find()) {
                unicodeChars.add(m.group(1));
            }
            for (int i = 0; i < unicodeChars.size(); i++) {
                temp = unicodeChars.get(i);
                if (isInteger(temp)) {
                    String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf(temp), 16)));//converting  
                    decodedfileName = decodedfileName.replace("_x" + temp + "_", replace_char);
                }
            }
            System.out.println("Decoded FileName :" + decodedfileName);
            return decodedfileName;
        }
    

    And use this small java util to know Is integer

    public static boolean isInteger(String s) {
            try {
                Integer.parseInt(s);
            } catch (NumberFormatException e) {
                return false;
            } catch (NullPointerException e) {
                return false;
            }
            return true;
        }
    

    So the above code works as simple as this :

    Example :

    0028 Left parenthesis U+0028 You can see in the link https://en.wikipedia.org/wiki/List_of_Unicode_characters

    String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf("0028"), 16)));
            System.out.println(replace_char);
    

    This code gives output : ( which is a Left parenthesis

    This is what the logic i have used in my java program.

    The above program will give results same as ISO9075.decode(test)

    Output :
    
    fileName : ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
    Decoded FileName :ABC1X 0400 0109-(1-2)_v2.pdf