I am working on an auto correct program for Japanese sentences and the missing character in the sentence would be represented as a space.
I am reading from 2 files...
Input file:
はアビガイル
おはよう くん
Dictionary file:
私はアビガイル
おはよう花くん
The missing characters 私 and 花 are represented as a space
How can I find the space from the input file?
I tried lineFromFile.find(" ")
but it returns trash since it is not the usual english characters. Also tried lineFromFile.find('\0x20')
and lineFromFile.find(' ')
I also tried string lineFromFile = u8"あび"
but u8 prefix gets an error "identifier 'u8' is undefined"
I am using C++, Visual Studio 2013, gcc 4.8.3 and my current code page is Unicode (UTF-8 with signature)
If you think this is a duplicate question, please comment the link to the same ANSWERED question
My plan is:
spaceIndex
)string temp
spaceIndex
in the variable temp
will betemp
Please help, I have 3 days :'(
The missing characters 私 and 花 are represented as a space
No they aren't. Looking at はアビガイル
in a hex editor shows that the first character is '\u3000'
which is IDEOGRAPHIC SPACE not SPACE.
So to find it you need to use find(u8"\u3000")
or find("\xe3\x80\x80)
If you're lucky and all the Japanese characters in your input files are encoded as three bytes in UTF-8 then you can treat them as having fixed positions in the strings and substitute blocks of three bytes from one string to another.