Search code examples
c++optimizationmd5password-recovery

What optimizations can be done for this custom MD5 comparison?


I'm trying to optimize this code that takes strings generated by maskprocessor being fed over the command line pipe, runs two rounds of OpenSSL libary MD5 on them (where the second run is using only a partial amount of the result from the first) and compares them to provided hashes provided on the command line.

It currently runs at approximately 800,000 lines/sec on a Core i7 2.67GHz Lenovo ThinkPad X201, doing 7,311,616 lines in under 10 seconds. I'd really like to see if there is anything that can be done to improve this. I am using Visual Studio 2012 and now 2013 as my base (evolved from bash then Perl scripts).

I would believe the only part of the process that is a bottleneck here is the comparison, which I've switched to memcmp from strcmp (although did not see a large boost from that). The MD5 and maskprocessor generation are beyond my ability to replace with my own code.

This code is part of my project to recover StuffIt 5 passwords by hash collisions, and works extremely well, but any boost in speed would be a large bonus (especially when multiple instances are ran).

An image of the process is available at Performance Perl vs. compiled

I am by no means a competent programmer, and I know that if Hashcat or any of the GPU accelerated password crackers could implement this algorithm it'd blow mine out of the water, but there's not enough demand to get it implemented. Trust me, I asked :(

    #define _CRT_SECURE_NO_WARNINGS 
    // Need OpenSSL Libs linked, headers linked, dlls linked
    #include <stdio.h>
    #include <string.h>
    #include <openssl/md5.h>
    #include <iostream>
    #include <ctime>

    using namespace std;
    int main(int argc, char* argv[]) 
    { 
    /* Start - Setup Timer */

    std::clock_t start;
    double duration;
    start = std::clock();

    /* End - Setup Timer */

    /* Start - Hash Length Check */

    int j;
    for(int i = 1; i < argc; i++) {
      j = strlen(argv[i]);
    if (j != 10){
            std::cout<<argv[i]<<" is "<<j<<" characters long - not 10! Quitting\n";
            return 0;}
    }

    /* End - Hash Length Check */

    /* Start - Line Entry and Count */
     char string[40]; 
     int i;
     __int64 linecount = 0; // caps at 9,223,372,036,854,775,807
     int millioncount = 0;
//  printf("Enter a string: ");
  while(fgets(string, 40, stdin)){

  /* remove newline, if present */
  i = strlen(string)-1;
  if( string[i] == '\n') 
      string[i] = '\0';

  //printf("This is your string: %s", string);
  linecount++;
  if(linecount%10000000 == 0){
      millioncount++;
    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    double linespersec = linecount/duration;
    std::cout<<millioncount*10<<" million tries ("<<linespersec<<" l/s)\n";
  }
    /* End- Line Entry */

  /* Start - MD5 Round 1 */

   unsigned char digest[MD5_DIGEST_LENGTH];
    char string2[5];     
    MD5((unsigned char*)&string, strlen(string), (unsigned char*)&digest);    

    /* End - MD5 Round 1 */

    /* Start - MD5 Round 2 */

// Set the string to the second MD5 hash of the first 5 characters (10 bit)
    //for(int i = 0; i < 5; i++)
        //string2[i] = digest[i];
    memcpy(string2, digest, 5);
    MD5((unsigned char*)&string2, 5, (unsigned char*)&digest);    

    char mdString3[33];

    for(int i = 0; i < 5; i++)
         sprintf(&mdString3[i*2], "%02x", (unsigned int)digest[i]);

  //  printf("\nmd5 digest: %s\n", mdString3);
       /* End - MD5 Round 2 */

    /* Start - Hash Check */
for(int i = 1; i < argc; i++) {

    //if (mdString3[0] == argv[i][0] && strcmp(mdString3, argv[i]) == 0){ // added the 0 comp, no real improvements
    if (memcmp(mdString3, argv[i], 10) == 0){ // 785-795k
            printf("Success at: %s", string);
    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    double linespersec = linecount/duration;
    std::cout<<" for "<< argv[i]<<" in "<< duration <<" seconds at line "<<linecount<<" ("<<linespersec<<" l/s)\n";
    }
}
  }
    /* End - Hash Check*/

  /* Start - Timer Closeout */
    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
        double linespersec = linecount/duration;
    std::cout<<"Exhausted search of "<<linecount<<" lines in "<< duration <<" seconds ("<<linespersec<<" l/s)\n";
/* End - Timer Closeout */
    return 0;
}

Solution

  • You can get an improvement by eliminating the loop around the sprintf call and use this instead:

    sprintf(&mdString3[0], 
            "%02x%02x%02x%02x%02x", 
            (unsigned char)digest[0],
            (unsigned char)digest[1], 
            (unsigned char)digest[2], 
            (unsigned char)digest[3], 
            (unsigned char)digest[4]);