Search code examples
c++unicodesha1crypto++digest

Get SHA1 of Unicode string in Crypto++


I study C++ independently and I have one problem, which I can't solve more than week. I hope you can help me.

I need to get a SHA1 digest of a Unicode string (like Привет), but I don't know how to do that.

I tried to do it like this, but it returns a wrong digest!

For wstring('Ы') It returns - A469A61DF29A7568A6CC63318EA8741FA1CF2A7
I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

Regards and sorry for my English :).

CryptoPP 5.6.2 MVC++ 2013

#include <iostream>
#include "cryptopp562\cryptlib.h"
#include "cryptopp562\sha.h"
#include "cryptopp562\hex.h"

int main() {

    std::wstring string(L"Ы");
    int bs_size = (int)string.length() * sizeof(wchar_t);

    byte* bytes_string = new byte[bs_size];

    int n = 0; //real bytes count
    for (int i = 0; i < string.length(); i++) {
        wchar_t wcharacter = string[i];

        int high_byte = wcharacter & 0xFF00;

        high_byte = high_byte >> 8;

        int low_byte = wcharacter & 0xFF;

        if (high_byte != 0) {
            bytes_string[n++] = (byte)high_byte;
        }

        bytes_string[n++] = (byte)low_byte;
    }

    CryptoPP::SHA1 sha1;
    std::string hash;

    CryptoPP::StringSource ss(bytes_string, n, true,
        new CryptoPP::HashFilter(sha1,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(hash)
            ) 
        ) 
    );

    std::cout << hash << std::endl;

    return 0;
}

Solution

  • I need to get a SHA1 digest of a Unicode string (like Привет), but I don't know how to do that.

    The trick here is you need to know how to encode the Unicode string. On Windows, a wchar_t is 2 octets; while on Linux a wchar_t is 4 otects. There's a Crypto++ wiki page on it at Character Set Considerations, but its not that good.

    To interoperate most effectively, always use UTF-8. That means you convert UTF-16 or UTF-32 to UTF-8. Because you are on Windows, you will want to call WideCharToMultiByte function to convert it using CP_UTF8. If you were on Linux, then you would use libiconv.

    Crypto++ has a built-in function called StringNarrow that uses C++. Its in the file misc.h. Be sure to call setlocale before using it.

    Stack Overflow has a few question on using the Windows function . See, for example, How do you properly use WideCharToMultiByte.


    I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

    What is the hash (SHA-1, SHA-256, ...)? Is it a HMAC (keyed hash)? Is the information salted (like a password in storage)? How is it encoded? I have to ask because I cannot reproduce your desired results:

    SHA-1:   2805AE8E7E12F182135F92FB90843BB1080D3BE8
    SHA-224: 891CFB544EB6F3C212190705F7229D91DB6CECD4718EA65E0FA1B112
    SHA-256: DD679C0B9FD408A04148AA7D30C9DF393F67B7227F65693FFFE0ED6D0F0ADE59
    SHA-384: 0D83489095F455E4EF5186F2B071AB28E0D06132ABC9050B683DA28A463697AD
             1195FF77F050F20AFBD3D5101DF18C0D
    SHA-512: 0F9F88EE4FA40D2135F98B839F601F227B4710F00C8BC48FDE78FF3333BD17E4
             1D80AF9FE6FD68515A5F5F91E83E87DE3C33F899661066B638DB505C9CC0153D
    

    Here's the program I used. Be sure to specify the length of the wide string. If you don't (and use -1 for the length), then WideCharToMultiByte will include the terminating ASCII-Z in its calculations. Since we are using a std::string, we don't need the function to include the ASCII-Z terminator.

    int main(int argc, char* argv[])
    {
        wstring m1 = L"Привет"; string m2;
    
        int req = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), NULL, 0, NULL, NULL);
        if(req < 0 || req == 0)
            throw runtime_error("Failed to convert string");
    
        m2.resize((size_t)req);
    
        int cch = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), &m2[0], (int)m2.length(), NULL, NULL);
        if(cch < 0 || cch == 0)
            throw runtime_error("Failed to convert string");
    
        // Should not be required
        m2.resize((size_t)cch);
    
        string s1, s2, s3, s4, s5;
        SHA1 sha1; SHA224 sha224; SHA256 sha256; SHA384 sha384; SHA512 sha512;
    
        HashFilter f1(sha1, new HexEncoder(new StringSink(s1)));
        HashFilter f2(sha224, new HexEncoder(new StringSink(s2)));
        HashFilter f3(sha256, new HexEncoder(new StringSink(s3)));
        HashFilter f4(sha384, new HexEncoder(new StringSink(s4)));
        HashFilter f5(sha512, new HexEncoder(new StringSink(s5)));
    
        ChannelSwitch cs;
        cs.AddDefaultRoute(f1);
        cs.AddDefaultRoute(f2);
        cs.AddDefaultRoute(f3);
        cs.AddDefaultRoute(f4);
        cs.AddDefaultRoute(f5);
    
        StringSource ss(m2, true /*pumpAll*/, new Redirector(cs));
    
        cout << "SHA-1:   " << s1 << endl;
        cout << "SHA-224: " << s2 << endl;
        cout << "SHA-256: " << s3 << endl;
        cout << "SHA-384: " << s4 << endl;
        cout << "SHA-512: " << s5 << endl;
    
        return 0;
    }