Search code examples
c++stringdynamicwstring

How i do return one letter same size regardless of languages


I am beginer. i am studying Dynamic allocate in C++.

I want to get a certain string length back regardless of whether I enter Korean or English. So I'm using wstring. However, just as wstring.length() or wstring.size() continues to be a Multibyte system, Korean returns 2 bytes and English 1 byte How i do return one letter same size regardless of languages.

void ex_03() {
    std::wcout << endl<< "Your Input : " << endl;
    wstring inputval;
    std::wcin >> inputval;
    int slength = inputval.size();

    wchar_t* p = new wchar_t[slength];

    int cnt = 0;
    for (wchar_t a : inputval) {
        p[cnt] = a;
        cnt++;
    }

    std::wcout << endl << "your Input length : " << slength << endl << "String is : " << p;
    
    delete[] p;
}

i think to use wstring.. but it is strange..


Solution

  • Given that your task is:

    Write a program that receives a string as input, and then stores it in dynamically allocated memory using the new operator. The output of the program should be the length of the string and the string itself.

    ... it's almost certainly not required to handle Unicode strings. You don't even need to use std::wcout and std::wcin; the task probably gives you English input only and std::string would be enough.

    See also Getting the actual length of a UTF-8 encoded std::string? for count code points in UTF-8 strings. Assuming your std::cin gives you UTF-8 strings (it almost certainly does), you could use this approach.

    However, even counting code points in UTF-8 strings isn't enough because there are graphemes which consist of multiple code points, such as é (U+00E9, Latin Small Letter E with Acute) which can also be formed by a combining acute accent (U+0301) with base letter e (U+0065).

    Furthermore, Devanagari Letter Qa क़ (U+0958) is also represented by:

    • Devanagari Letter Ka (U+0915)
    • Devanagari Sign Nukta (U+093C)

    There are numerous other examples. The point is that getting the grapheme-length of a string is extremely complicated and likely not what the task requires of you. See also Cross-platform iteration of Unicode string (counting Graphemes using ICU)