How is text saved in memory?

Supposed there is a utf8-encoded file:

file1.txt

汉字

which binary representation is:

11100110 10110001 10001001 11100101 10101101 10010111

If I open it with an editor, which will read the bit sequence and decode it. I can see 汉字 in editor, and 汉字 will be saved in memory.

Then, now

what is the bit sequence? Is it the same as above?
Does it depend on platform?
Is result ever same with various encoded text?

Solution

As so often, the answer is "it depends".

Generally speaking in-memory text has to use some encoding just like on-disk text does.

But whether that encoding is the same as the on-disk one or not depends on the application.

Some might have a preferred encoding that they will represent the text in memory (such as UTF-16 or even UCS-4 if they are feeling wasteful) and others might hold it in-memory in the same encoding as used on-disk and just interpret it as necessary when rendering/searching.

There's no universal rule that requires one approach or another. Some languages/platforms have a strong preference.

For example Java uses UTF-16 for in-memory String objects (except as an internal optimization it might sometimes use Latin-1 if the text allows it).