Search code examples
calgorithmhex

In c, how to convert L"ñ" to wchar_t and other var; equivalent, like this: ->Lmyvar


I have a problem in the following code:

int convert_to_opcodes(wchar_t *parts) {
wchar_t caracter[1];
wcscpy(caracter, parts);
return (int)caracter[0];
};
int main(void) {
wchar_t *text1=L"ñ";
printf("Opcode: %x\n",convert_to_opcodes(text1));

return 0;
}

This code found but I want to give a or convert L"ñ" to a argv[1] or other var equivalent:

int convert_to_opcodes(wchar_t *parts) {
wchar_t caracter[1];
wcscpy(caracter, parts);
return (int)caracter[0];
};
int main(void) {
char *example="ñ"
wchar_t *text1=example;
printf("Opcode: %x\n",convert_to_opcodes(text1));

return 0;
}

Question:

How do I get the opcode of a character from A-Z0-1 and the special character like "ñ,%$?" For example?

What I hope to get are the opcodes of for example "ñ"=f1 and so on with each character


Solution

  • An "opcode" is a CPU instruction. I think you mean "Unicode Code Point".

    One can refer to converting the text from encoded form to a wide string as decoding it, and it can be done using mbrtowc.[1]

    You didn't specify the encoding, so I assumed UTF-8.

    #include <locale.h>
    #include <stdio.h>
    #include <string.h>
    #include <wchar.h>
    
    int main( void ) {
       setlocale( LC_ALL, "en_US.utf8" );  // Converting from UTF-8?
    
       const char *text_utf8 = "ñ";
       size_t remaining = strlen( text_utf8 );
    
       mbstate_t state;
       memset( &state, 0, sizeof( state ) );
    
       while ( remaining ) {
          wchar_t ucp;
          size_t rv = mbrtowc( &ucp, text_utf8, remaining, &state );
    
          if ( rv == 0 )            // NUL encountered.
             break;
          if ( rv == (size_t)-2 )   // Incomplete sequence encountered.
             break;
          if ( rv == (size_t)-1 )   // Other error encountered.
             break;
    
          encoded_text += rv;
          remaining    -= rv;
    
          printf( "U+%06lX\n", (unsigned long)ucp );
       }
    }
    
    U+0000F1
    

    1. Note that the example on the linked page is incorrect/incomplete since it doesn't take errors into consideration. (size_t)-2 and (size_t)-1 are both larger than 0.