I am trying to generate md5 hash from Powershell. I installed Powershell Community Extension (Pscx) to get command : Get-Hash
However when I generate md5 hash using Get-Hash
, it doesn't seem to match the hash generated using md5sum
on an Ubuntu machine.
PS U:\> "hello world" | get-hash -Algorithm MD5
Path Algorithm HashString Hash
---- --------- ---------- ----
MD5 E42B054623B3799CB71F0883900F2764 {228, 43, 5, 70...}
root@LT-A03433:~# echo "hello world" | md5sum
6f5902ac237024bdd0c176cb93063dc4 -
I know that the one generated by Ubuntu is correct as a couple of online sites show the same result.
What am I going wrong with Powershell Get-Hash?
The difference is not obvious, but you are not hashing the same data. MD5 is a hashing algorithm, and it has no notion of text encoding – this is why you can create a hash of binary data just as easily as a hash of text. With that in mind, we can find out what bytes (or octets; strictly a stream of values of 8 bits each) MD5 is calculating the hash of. For this, we can use xxd
, or any other hexeditor.
First, your Ubuntu example:
$ echo "hello world" | xxd
0000000: 6865 6c6c 6f20 776f 726c 640a hello world.
Note the 0a
, Unix-style newline at the end, displayed as .
in the right view. echo
by default appends a newline to what it prints, you could use printf
, but this would lead to a different hash.
$ echo "hello world" | md5
6f5902ac237024bdd0c176cb93063dc4
Now let's consider what PowerShell is doing. It is passing a string of its own directly to the get-hash
cmdlet. As it turns out, the natural representation of string data in a lot of Windows is not the same as for Unix – Windows uses wide strings, where each character is represented (in memory) as two bytes. More specifically, we can open a text editor, paste in:
hello world
With no trailing newline, and save it as UTF-16, little-endian. If we examine the actual bytes this produces, we see the difference:
$ xxd < test.txt
0000000: 6800 6500 6c00 6c00 6f00 2000 7700 6f00 h.e.l.l.o. .w.o.
0000010: 7200 6c00 6400 r.l.d.
Each character now takes two bytes, with the second byte being 00
– this is normal (and is the reason why UTF-8 is used across the Internet instead of UTF-16, for example), since the Unicode codepoints for basic ASCII characters are the same as their ASCII representation. Now let's see the hash:
$ md5 < thefile.txt
e42b054623b3799cb71f0883900f2764
Which matches what PS is producing for you.
So, to answer your question – you're not doing anything wrong. You just need to encode your string the same way to get the same hash. Unfortunately I don't have access to PS, but this should be a step in the right direction: UTF8Encoding class.