Search code examples
c++multithreadingdata-analysis

Strange behaviour in multithread analysis


for a university project we are implementing an algorithm capable of bruteforcing on an AES key that we assume is partially known. We have implemented several versions including one that exploits the multithreading mechanism in C++. The implementation is done by allocating a variable number of threads, to be passed as input at launch, and dividing the key space equally for each thread that will cycle through the respective range attempting each key. De facto the implementation works, as it succeeds in finding the key for any combination #bitsToHack/#threads but returns strange timing results.

    //Structs for threads and respective data
    pthread_t threads[num_of_threads];
    struct bf_data td [num_of_threads];
    int rc;
    //Space division
    uintmax_t index = pow (BASE_NUMBER, num_bits_to_hack);
    uintmax_t step = index/num_of_threads;

    if(sem_init(&s, 1, 0)!=0){
        printf("Error during semaphore initialization\n");
        return -1;
    }

    for(int i = 0; i < num_of_threads; i++){
        //Structure initialization
        td[i].ciphertext = ciphertext;
        td[i].hacked_key = hacked_key;
        td[i].iv_aes = iv_aes;
        td[i].key = key_aes;
        td[i].num_bits_to_hack = num_bits_to_hack;
        td[i].plaintext = plaintext;
        td[i].starting_point = step*i;
        td[i].step = step;
        td[i].num_of_threads = num_of_threads;
        if(DEBUG)
            printf("Starting point for thread %d is: %lu, using step: %lu\n", i , td[i].starting_point, td[i].step);

        rc = pthread_create(&threads[i], NULL, decryption_brute_force, (void*)&td[i]);

        if (rc){
            cout << "Error:unable to create thread," << rc << endl;
            exit(-1);
        }
    }
    sem_wait(&s);
    for(int i = 0; i < num_of_threads; i++){
        pthread_join(threads[i], NULL);
    }

For the decryption_brute_force function (The body of each thread):

void* decryption_brute_force(void* data){
   ** Copy data on local thread memory
   ** Build the key to begin the search from starting point
   ** for each key from starting_point to starting_point + step
       ** Try decryption
       ** if obtained plaintext corresponds to the expected one
          ** Print results, wake up main thread and terminate
       ** else
          ** increment the key and continue

}

To conclude the project we intended to conduct a study of the optimal number of threads expecting an increase in performance as the number of threads increased up to a threshold, after which the system would no longer benefit from the increase in threads assigned to it. At the end of the analysis (a simulation lasting about 9 hours), the results obtained were as follows in figure. Click here to see the plot. We cannot understand why 8 threads performs better than 16. Could it be due to the CPU architecture? Could it be able to schedule 32 and 8 threads better than 16?


Solution

  • From comments, I think it could be the linear-search pattern in each thread yields to different results for different number of threads. Because when you double the threads, the actual linear point to find in a thread may shift to a further point. But once you double again, it can not go much further due to too many threads. Because you said you are using only same encrypted data always. Did you try different inputs?

                 this variable is integer (so it may not be exact distribution)
                 ^
    8 threads & step=7 (56 work total)
                       index-16 (0-based)
                       v
     01234567 89abcdef 01234567 89abcdef
    |        |        |.       |        ...
    500 seconds as its the first loop iteration
    
    16 threads & step=3 (56 work total)
                          index-16 again, but at second-iteration now
                          v
     012 345 678 9ab cde f01 234 567 8
    |   |   |   |   |   | . |   |   |   ...
    1000 seconds as it finds only after second iteration in the thread
    

    Another example with 2 threads and 3 threads:

    x to found at 51-th element of 100-element-work:
    2 threads
    |                     |x(1st iteration)       |
    3 threads
    |             |........x     |                |
    5x slower than 2 threads