in Windows 10 app I try to read string from .txt file and set the text to RichEditBox:
Code variant 1:
var read = await FileIO.ReadTextAsync(file, Windows.Storage.Streams.UnicodeEncoding.Utf8);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, read);
Code variant 2:
var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.ReadWrite);
ulong size = stream.Size;
using (var inputStream = stream.GetInputStreamAt(0))
{
using (var dataReader = new Windows.Storage.Streams.DataReader(inputStream))
{
dataReader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
uint numBytesLoaded = await dataReader.LoadAsync((uint)size);
string text = dataReader.ReadString(numBytesLoaded);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, text);
}
}
On some files I have this error - "No mapping for the Unicode character exists in the target multi-byte code page"
I found one solution:
IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = Encoding.UTF8.GetString(fileContent, 0, fileContent.Length);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);
But with this code the text looks like question marks in rhombus.
How I can read and display same text files in normal encoding?
Solution:
1) I made a port of Mozilla Universal Charset Detector to UWP (added to Nuget)
ICharsetDetector cdet = new CharsetDetector();
cdet.Feed(fileContent, 0, fileContent.Length);
cdet.DataEnd();
2) Nuget library Portable.Text.Encoding
if (cdet.Charset != null)
string text = Portable.Text.Encoding.GetEncoding(cdet.Charset).GetString(fileContent, 0, fileContent.Length);
That's all. Now unicode ecnodings (include cp1251, cp1252) - works good ))