Ok I know how to convert decimal to 8-bit
for a example the char "A" by decimal it's 65
It's very simple to convert it into binary
But what if the decimal is largen than 255
Example the Arabic char "م"
in decimal it is 1605
and in binary it is 11001000101
When I convert it in any website it shows 11011001 10000101
I want to know how 11001000101
be 11011001 10000101
Your Arabic char "م" has code point 1605 in decimal. This is 0645h in hexadecimal and it is 0000'0110'0100'0101b in binary.
The utf-8 encoding will represent all characters with a code point in the range U+0000 to U+007F with 1 byte, using next template:
0_______
^
| 7 bits
The utf-8 encoding will represent all characters with a code point in the range U+0080 to U+07FF with 2 bytes. Your Arabic char "م" is at U+0645h in this range.
When dealing with 2 bytes the template becomes
110_____ 10______
^ ^
| | 6 bits
| 5 bits
In this template we fill in the lowest (only) 11 bits of the binary representation of your code point 11001'000101b:
110_____ 10______
^ ^
| 11001 | 000101
This produces the binary 110'11001'10'000101b
Below is the x86 assembly version of the conversion for code points in [U+128, U+2047]:
<------ AX ------->
mov ax, 1605 ; Your example: 0000 0110 0100 0101
/ /
/ / Shift left the whole 16 bits, twice
shl ax, 2 0001 1001 0001 0100
\ \
\ \ Shift right the lowest 8 bits, twice
shr al, 2 0001 1001 0000 0101
||| ||
||| || Put in the template bits
or ax, 1100000010000000b 1101 1001 1000 0101
<- AH --> <-- AL ->
Now the AH
register contains the first byte of the utf-8 encoding and the AL
register contains the second byte of the utf-8 encoding.
Because the x86 is a little endian architecture where the lowest byte is stored first in memory, an xchg al, ah
instruction will fix the order of the bytes right before moving the result to memory:
mov [somewhere], ax
.