It is just for a hobby, no official or serious stuff involved. I do have a question though; when running this code, it computes around 400 million md5 hashes in the first second, then it drops down to ~10 million hashes per second, is there a way to speed this up?
Here is the code:
using System;
using System.Security.Cryptography;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.InteropServices;
using Org.BouncyCastle.Crypto.Digests;
using Org.BouncyCastle.Crypto.Paddings;
public class SimdMD5HashFinder
{
private static readonly int numberOfThreads = 16; // Number of threads (cores)
private static readonly CancellationTokenSource cts = new CancellationTokenSource(); // Cancellation token for cancellation management
private const int batchSize = 25000000; // Number of random byte arrays to process in each batch
private const int byteArraySize = 16; // Size of each random byte array (16 bytes for MD5 input)
public static ulong n = 0;
[DllImport("kernel32.dll")]
private static extern IntPtr GetCurrentThread();
[DllImport("kernel32.dll", SetLastError = true)]
private static extern IntPtr SetThreadAffinityMask(IntPtr hThread, IntPtr dwThreadAffinityMask);
public static void Main(string[] args)
{
Console.WriteLine("Starting multiple threads to find MD5 hashes with the first three bytes equal to zero...");
Task[] tasks = new Task[numberOfThreads];
for (int i = 0; i < numberOfThreads; i++)
{
int threadId = i; // Capture the current thread index
tasks[i] = Task.Run(() => FindHash(threadId));
}
// Wait for tasks to complete
Task.WaitAll(tasks);
Console.WriteLine("All threads have completed or a match was found.");
Console.ReadKey();
}
private static void FindHash(int threadId)
{
// Pin the thread to a specific CPU core
PinThreadToCore(threadId);
// Create MD5 digest object from BouncyCastle
MD5Digest md5 = new MD5Digest();
byte[] randomBytes = new byte[batchSize * byteArraySize]; // Single large array to hold all random bytes for the batch
byte[] hashBytes = new byte[md5.GetDigestSize()]; // MD5 hash is always 16 bytes
RandomNumberGenerator rng = RandomNumberGenerator.Create();
while (!cts.Token.IsCancellationRequested) // Loop until cancellation is requested
{
n += (ulong)batchSize; // Increment n by batchSize
if (n % 1000000 == 0)
{
Console.WriteLine(n);
}
// Fill the large buffer with random bytes
rng.GetBytes(randomBytes);
// Process the buffer in chunks, each chunk is a 16-byte random array
for (int i = 0; i < randomBytes.Length; i += byteArraySize)
{
// Calculate MD5 hash of the 16-byte chunk
md5.BlockUpdate(randomBytes, i, byteArraySize);
md5.DoFinal(hashBytes, 0);
// Check if the first three bytes of the hash are all zero
if (hashBytes[0] == 0 && hashBytes[1] == 0 && hashBytes[2] == 0 && hashBytes[3] == 0)
{
Console.WriteLine($"\nThread {threadId} found a match!");
Console.WriteLine($"Random Bytes: {BitConverter.ToString(randomBytes, i, byteArraySize).Replace("-", "").ToLower()}"); // Show the chunk of random bytes as hex
Console.WriteLine($"MD5 Hash Bytes: {BitConverter.ToString(hashBytes).Replace("-", "").ToLower()}");
// Signal cancellation to stop all other threads
cts.Cancel();
return; // Exit the method once a match is found
}
// Reset the MD5 digest for the next round
md5.Reset();
}
}
}
// Method to pin a thread to a specific core
private static void PinThreadToCore(int coreId)
{
IntPtr mask = new IntPtr(1 << coreId); // Create affinity mask for the given core
IntPtr thread = GetCurrentThread(); // Get current thread handle
// Set the affinity mask to bind the thread to the specific core
SetThreadAffinityMask(thread, mask);
}
}
I tried almost everything, but this is the fastes way of doing things as far as I know. I create a buffer of size N, since otherwise it slows down because of the overhead of creating a new MD5 object every time. I tried looking into GPU computing with ILGPU but that i found way too compicated.
There are a couple possible improvements.
Side note: I'm not terribly familiar with C#, so I might make mistakes.
Don't use random bytes for the hashes. Any decent cryptographic hash will transform the input to something that looks completely random, so using random values adds a bunch more work, while a counter is just as good.
On the topic of the counter, you have two main options: a BigInteger
or an unsigned long counter.
The BigInteger
is simpler, but is much slower. On the other hand, you could use an unsigned long counter (one per task) and a unique-per-task one-byte ID. That would give you ~18 quintillion iterations per thread before running out.
You could add a few micro-optimizations, like:
On a similar note, if you really want to make it fast...