Search code examples
javaconventionsextended-ascii

Converting extended ASCII string to hindi text


I am receiving a string text via USB communication in android in form of extended ASCII characters like

String receivedText = "5286T11ɬ ªË ¦¿¯¾ ¯¾ ɬ ¨¬°:A011605286 ª¿ª ¾®:12:45 ¸Í®°:(9619441121)ª¿ª:-, ®¹¿¦Í°¾ ¡ ®¹¿¦Í°¾ ª¨À, ¾¦¿µ²À ¸Í, ¾¦¿µ²À ªÂ°Íµ °¿®¾°Í͸:- ¡Í°Éª:-, ¬¾¹°, ¸¾¤¾Í°Â¼ ªÂ°Íµ~";

Now these character represents a string in hindi.

I am not getting how to convert this received string into hindi equivalent text. Any one knows how to convert this into equivalent hindi text using java

Following is the piece of code which I am using to convert byte array to byte string

public String byteArrayToByteString(byte[] arayValue, int size) {
        byte ch = 0x00;
        int i = 0;

        if (arayValue == null || arayValue.length <= 0)
            return null;

        String pseudo[] = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
                "A", "B", "C", "D", "E", "F" };
        StringBuffer out = new StringBuffer();

        while (i < size) {

            ch = (byte) (arayValue[i] & 0xF0); // Strip off high nibble
            ch = (byte) (ch >>> 4); // shift the bits down
            ch = (byte) (ch & 0x0F); // must do this is high order bit is on!
            out.append(pseudo[(int) ch]); // convert the nibble to a String
            // Character
            ch = (byte) (arayValue[i] & 0x0F); // Strip off low nibble
            out.append(pseudo[(int) ch]); // convert the nibble to a String
            // Character
            i++;
        }
        String rslt = new String(out);

        return rslt;
    }

Let me know if this helps in finding solution

EDIT:

Its an UTF-16 encoding and the characters in receivedText string is in form of extended ASCII for hindi characters

New Edit

I have new characters

String value = "?®Á?Ƕ ¡??°¿¯¾";

Which says मुकेश in hindi and dangaria in hindi. Google translator is not translating dangaria in hindi so I cannot provide you hindi version of it.

I talked to the person who is encoding he said that he removed 2 bits from the input before encoding i.e. if \u0905 represents अ in hindi then he removed \u09 from the input and converted remaining 05 in extended hexadecimal form.

So the new input string I provided you is decoded in form of above explanation. i.e. \u09 is been removed and rest is converted into extended ascii and then sent to device using USB.

Let me know if this explanation helps you in finding out solution


Solution

  • Generally, for a byte array that you know to be a string value, you can use the following.

    Assuming byte[] someBytes:

    String stringFromBytes = new String(someBytes, "UTF-16");
    

    You may replace "UTF-16" with the approprate charset, which you can find after some experimentation. This link detailing java's supported character encodings may be of help.

    From the details you have provided I would suggest considering the following:

    • If you're reading a file from a USB drive, android might have existing frameworks that will help you do this in a more standard way.
    • If you most certainly need to read in and manipulate the bytes from the USB port directly, make sure that you are familiar with the API/protocol of the data you are reading. It may be that some of the bytes are control messages or something similar that cannot be converted to strings, and you will need to identify exactly where in the byte stream the string begins (and ends).