trying to append new text to an existing file using fwrite() with mode "a+" but get weird string written

I am writing a program that is to insert texts to a file every time when it is called. I don't want to rewrite the entire file, and I want the new text could be inserted to a new line. Here is my test code:

void writeFile()
{
    FILE *pFile;
    char* data = "hahaha";
    int data_size = 7;
    int count = 1;
    pFile = fopen("textfile.bin","a+");
    if (pFile!=NULL)
    {
        fwrite (data, data_size, count, pFile);
        fclose (pFile);
    }
}

At the first time it got called, everything worked fine. A new file was created and the data was successfully written. But when I called it again and expected that a new data to be inserted, I got weird strings in the file, something like:慨慨慨栀桡桡a.

I am not really familiar with C++ I/O functions. Can someone tell me what I did wrong? Also, any suggestion for appending text to the next line?

Solution

I think you are running into a code set issue, and the program you're using to look at the file you write expects to find UTF-16 data in the file.

I base this on an analysis of the string you quote:

慨慨慨栀桡桡a

When that (UTF-8) data is converted to Unicode values, I get:

0xE6 0x85 0xA8 = U+6168
0xE6 0x85 0xA8 = U+6168
0xE6 0x85 0xA8 = U+6168
0xE6 0xA0 0x80 = U+6800
0xE6 0xA1 0xA1 = U+6861
0xE6 0xA1 0xA1 = U+6861
0x61 = U+0061
0x0A = U+000A

The Unicode values U+6168 is represented in little-endian as bytes 0x68 0x61, and the ASCII code for h is 104 (0x68) and for a is 97 (0x61). So, the data is probably written correctly, but the interpretation of the data that is written is incorrect.

As I noted in a comment:

If you want lines in the file, you'll need to put them there (by adding newlines to the data that is written), because fwrite() won't output any newlines unless they are in the data it is given to write. You have written a null byte to the file (because you used data_size = 7), which means the file is not really a text file (text files don't contain null bytes). What happens next depends on the code set you're using.

The trailing single-byte codes in the output appear because the second null byte isn't visible in what's pasted on this page, and the trailing U+000A was added by the echo in the command line I used for the analysis (where utf8-unicode is a program I wrote):

 echo "慨慨慨栀桡桡a" | utf8-unicode