This is a problem in analyzing some text files using Matlab, which is screwing up some of the text. I am using R2017a (9.2.0.538062) 64-bit (maci64). Please note the accented characters.
Other text editors are reading the file ("War and Peace.txt") correctly (Textmate, Emacs, Textedit, and GNU Octave), as well as other programs (Python, Ruby, Mathematica).
It was in July, 1805, and the speaker was the well-known Anna Pávlovna Schérer, maid of honor and favorite of the Empress Márya Fëdorovna.
Whereas in Matlab
It was in July, 1805, and the speaker was the well-known Anna Pávlovna Schérer, maid of honor and favorite of the Empress Márya Fëdorovna.
My Question
Is there a Matlab (preferences?) setting that will read Ascii text accurately? Matlab appears to be garbling valid Ascii characters (mostly in the 200-256 range).
I actually faced the same problem as yours, when trying to read string from a text file. The problem with me was that I saved the .txt
file in ANSI
Encoding Format. After many trials, I came up with a solution. First you have to save the file in UTF-8
Encoding format. Like this:
Then in your MATLAB code, you should specify the encondigIn
in fopen
command.
A test code can be something like:
close all;clearvars;clc;
fileID = fopen('text.txt', 'r', 'n', 'UTF-8');
C = textscan(fileID, '%s');
fclose(fileID);
celldisp(C)
The output of this code would be:
C{1}{1} =
It
C{1}{2} =
was
C{1}{3} =
in
C{1}{4} =
July,
C{1}{5} =
1805,
C{1}{6} =
and
C{1}{7} =
the
C{1}{8} =
speaker
C{1}{9} =
was
C{1}{10} =
the
C{1}{11} =
well-known
C{1}{12} =
Anna
C{1}{13} =
Pávlovna
C{1}{14} =
Schérer,
C{1}{15} =
maid
C{1}{16} =
of
C{1}{17} =
honor
C{1}{18} =
and
C{1}{19} =
favorite
C{1}{20} =
of
C{1}{21} =
the
C{1}{22} =
Empress
C{1}{23} =
Márya
C{1}{24} =
Fëdorovna.