Search code examples
vb.netmurmurhash

MurmurHash3 Test Vectors


I'm trying to port a C# implementation of MurmurHash3 to VB.Net.

It runs... but can someone provide me with some known Test Vectors to verify correctness?

  • Known string text
  • Seed value
  • Result of MurmurHash3

Thanks in advance.

Edit : I'm limiting the implementation to only the 32-bit MurmurHash3, but if you can also provide vectors for the 64-bit implementation, would also be good.


Solution

  • I finally got around to creating a MurMur3 implementation, and i managed to translate the SMHasher test code. My implementation gives the same result as the SMHasher test. That means i can finally give some useful, and assumed to be correct, test vectors.

    This is for Murmur3_x86_32 only

    | Input        | Seed       | Expected   |
    |--------------|------------|------------|
    | (no bytes)   | 0          | 0          | with zero data and zero seed, everything becomes zero
    | (no bytes)   | 1          | 0x514E28B7 | ignores nearly all the math
    | (no bytes)   | 0xffffffff | 0x81F16F39 | make sure your seed uses unsigned 32-bit math
    | FF FF FF FF  | 0          | 0x76293B50 | make sure 4-byte chunks use unsigned math
    | 21 43 65 87  | 0          | 0xF55B516B | Endian order. UInt32 should end up as 0x87654321
    | 21 43 65 87  | 0x5082EDEE | 0x2362F9DE | Special seed value eliminates initial key with xor
    | 21 43 65     | 0          | 0x7E4A8634 | Only three bytes. Should end up as 0x654321
    | 21 43        | 0          | 0xA0F7B07A | Only two bytes. Should end up as 0x4321
    | 21           | 0          | 0x72661CF4 | Only one byte. Should end up as 0x21
    | 00 00 00 00  | 0          | 0x2362F9DE | Make sure compiler doesn't see zero and convert to null
    | 00 00 00     | 0          | 0x85F0B427 | 
    | 00 00        | 0          | 0x30F4C306 |
    | 00           | 0          | 0x514E28B7 |
    

    For those of you who will be porting to a language that doesn't have actual arrays, i also have some string based tests. For these tests:

    • all strings are assumed to be UTF-8 encoded
    • and do not include any null terminator

    I'll leave these in code form:

    TestString("", 0, 0); //empty string with zero seed should give zero
    TestString("", 1, 0x514E28B7);
    TestString("", 0xffffffff, 0x81F16F39); //make sure seed value is handled unsigned
    TestString("\0\0\0\0", 0, 0x2362F9DE); //make sure we handle embedded nulls
    
    
    TestString("aaaa", 0x9747b28c, 0x5A97808A); //one full chunk
    TestString("aaa", 0x9747b28c, 0x283E0130); //three characters
    TestString("aa", 0x9747b28c, 0x5D211726); //two characters
    TestString("a", 0x9747b28c, 0x7FA09EA6); //one character
    
    //Endian order within the chunks
    TestString("abcd", 0x9747b28c, 0xF0478627); //one full chunk
    TestString("abc", 0x9747b28c, 0xC84A62DD);
    TestString("ab", 0x9747b28c, 0x74875592);
    TestString("a", 0x9747b28c, 0x7FA09EA6);
    
    TestString("Hello, world!", 0x9747b28c, 0x24884CBA);
    
    //Make sure you handle UTF-8 high characters. A bcrypt implementation messed this up
    TestString("ππππππππ", 0x9747b28c, 0xD58063C1); //U+03C0: Greek Small Letter Pi
    
    //String of 256 characters.
    //Make sure you don't store string lengths in a char, and overflow at 255 bytes (as OpenBSD's canonical BCrypt implementation did)
    TestString(StringOfChar("a", 256), 0x9747b28c, 0x37405BDC);
    

    I'll post just two of the 11 SHA-2 test vectors that i converted to Murmur3.

    TestString("abc", 0, 0xB3DD93FA);
    TestString("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq", 0, 0xEE925B90);
    

    And finally, the big one:

    • Key: "The quick brown fox jumps over the lazy dog"
    • Seed: 0x9747b28c
    • Hash: 0x2FA826CD

    If anyone else can confirm any/all of these vectors from their implementations.

    And, again, these test vectors come from an implementation that passes the SMHasher 256 iteration loop test from KeySetTest.cpp - VerificationTest(...).

    These tests came from my implementation in Delphi. I also created an implementation in Lua (which isn't big on supporting arrays).

    Note: Any code released into public domain. No attribution required.