Search code examples
clinuxunicodeutf-8

Binary to UTF-8 in C


I am working on an application in C where I need to show Unicode UTF-8 characters. I am getting the values as a binary byte stream as 11010000 10100100 as character array which is the Unicode character "Ф".

I want to store and display the character. I tried to convert the binary to a hexadecimal character array. But printing with

void binaryToHex(char *bData) {
    char hexaDecimal[MAX];
    int temp;
    long int i = 0, j = 0;
    while (bData[i]) {
        bData[i] = bData[i] - 48;

        ++i;
    }

    --i;
    while (i - 2 >= 0) {
        temp = bData[i - 3] * 8 + bData[i - 2] * 4 + bData[i - 1] * 2 + bData[i];
        if (temp > 9)
            hexaDecimal[j++] = temp + 55;
        else
            hexaDecimal[j++] = temp + 48;
        i = i - 4;
    }

    if (i == 1)
        hexaDecimal[j] = bData[i - 1] * 2 + bData[i] + 48;
    else if (i == 0)
        hexaDecimal[j] = bData[i] + 48;
    else
        --j;

    printf("Equivalent hexadecimal value: ");
    char hexVal[MAX];
    // size_t len = j+1;
    int k = 0;;
    while (j >= 0) {
        char *ch = (char*)hexaDecimal[j--];
        if (j % 2 == 0) {
            hexVal[k] = '\\';
            k++;
            hexVal[k] = 'x';
            k++;
        }
        printf("\nkk++Length %d ...J= %d.. ", k, j);
        hexVal[k] = ch;
        k++;
        printf("%c", ch);
    }
    printf("KKKK+=== %d", k);
    hexVal[k] = NULL;

    // printf("\nkk++Length %d",strlen(hexVal));
    printf("\nMM+-+MM %s===\n ..>>>>", hexVal);
}

Only showing the value as \xD0\xA4. I did string manipulation for that. But when writing in the way

 char s[]= "\xD0\xA4";
         OR
 char *s= "\xD0\xA4";
 printf("\n %s",s);

producing the desired result that is printing the character "Ф". How can I get the correct string dynamically? Is there any library for this in C?

The code is from http://www.cquestions.com/2011/07/binary-to-hexadecimal-conversion-in.html.

Is there a way to print it from binary directly or from a HEX value. Or is there an alternative for that?


Solution

  • At last converting the Unicode binary char array to actual binary codepoint like converting 11010000 10100100 to 10000 100100 and then converting to decimal and then to Unicode solved my problem for now.below is the link I use to convert to UTF8 from decimal.

    C++ Windows decimal to UTF-8 Character Conversion

    resources I used:

    https://www.youtube.com/watch?v=vLBtrd9Ar28

    https://web.archive.org/web/20180216185523/http://www.zehnet.de/2005/02/12/unicode-utf-8-tutorial/