Search code examples
cutf-8pipeemoji

Reading Emojis through a pipe in C


I have a pipe with an endless amount of strings being written to it. These strings are a mix of ASCII and Emojis. The problem I am having is I am reading them like this

char msg[100];
int length = read(fd,&msg,99);
msg[length] =0;

But sometimes the emoji I'm guessing is multibyte and it is getting cut in half and then when I print to the screen I get the diamond question mark unknown UTF-8 symbol.

If anyone knows how to prevent this please fill me in; I've been searching for a while now.


Solution

  • If you're reading chunks of bytes, and want to output chunks of UTF-8, you'll have to do at least some minimal UTF-8 decoding yourself. The simplest condition to check for is look at each byte (let's call it b) and see if it is a continuation byte:

    bool is_cont = (0x80 == (0xC0 & b));
    

    Any byte that is not a continuation starts a sequence, which continues until the next non-continuation byte. You'll need a 4-byte buffer to hold the chunks.