Search code examples
c++encodingfile-iowchar-t

C++: Files, encodings and datatypes


---- PLEASE CLOSE ----

------ Edit ---------

I found where the problem is. I'm going to start a new question for the real problem ....

----------------------

 


Hi,

My Situation:

Linux (Ubuntu 10.04)
gcc

But it has to be platform independent

I have a text file (UTF-8) with special characters like ¥ © ® Ỳ È Ð. I have a std::map where I would like to have a datatype for the key to hold these special characters. Currently I'm using wchar_t.

Then I have to use strings, which can contain these chars. Now I'm using std::wstring.

I have to read that UTF-8 file. So, I thought using a wifstream. And for line processing, I used wstringstream.

I think this isn't bad, what I've done so far... If not, what is better?

What is going wrong:

Of course, I have to read that file. But reading the lines stops at the first line with a special char. In short this is what I did:

map<wchar_t, Glyph*> glyphs;

//...

wifstream in(txtFile.c_str());
 if (!in.is_open())
 {
  throw runtime_error("Cannot open font text file!!");
 }
 wstring line;
 while (getline(in, line)) // edit
 {
  printf("Loading glyph\n");
  if (line.length() == 0)
  {
   continue;
  }
  wchar_t keyChar = line.at(0);
  /* First, put the four floats into the wstringstream */
  wstringstream ss(line.substr(2));
  /* Now, read them out */
  Glyph *g = new Glyph();
  ss >> g->x;
  ss >> g->y;
  ss >> g->w;
  ss >> g->h;
  glyphs[keyChar] = g;
  printf("Glyph `%c` (%d): %f, %f, %f, %f\n", keyChar, keyChar, g->x, g->y, g->w, g->h);

 }

So, the question is: How to read a file with the special chars with a wifstream?

Thanks in advance!

How the file looks:

  0.000000 0.000000 0.010909 0.200000
A 0.023636 0.000000 0.014545 0.200000
B 0.050909 0.000000 0.014545 0.200000
C 0.078182 0.000000 0.014545 0.200000
D 0.105455 0.000000 0.014545 0.200000
E 0.132727 0.000000 0.014545 0.200000

....

È 0.661818 0.400000 0.014545 0.200000
É 0.689091 0.400000 0.014545 0.200000
Ê 0.716364 0.400000 0.014545 0.200000
Ë 0.743636 0.400000 0.014545 0.200000
Ì 0.770909 0.400000 0.012727 0.200000
Í 0.796364 0.400000 0.012727 0.200000
Î 0.821818 0.400000 0.012727 0.200000
Ï 0.847273 0.400000 0.012727 0.200000
Ð 0.872727 0.400000 0.014545 0.200000
Ñ 0.900000 0.400000 0.014545 0.200000

Solution

    1. use while( !in ) instead of the eof variant, it's better, see this question

    2. I'm assuming you're using Windows (as Linux and Mac normally have native UTF-8 platform encoding, which allows you to ignore most of this stuff).

    What I would do is read the whole file as chars and convert it to wchar_t's using the handy functions in this question by me :).

    Remember: on linux (and probably mac os x too) you can just output a UTF-8 stream to a terminal and get the right characters, in Windows, that's a whole different kond of story.