Search code examples
mongodbencryptionycsb

Huge runtime difference between running YCSB with and without encryption with Workload E


i have made some test running ycsb in mongodb enterprise with and without encryption at rest. I was using the default workloads and i found some weird results when running the workload E.

Without encryption the runtime was about 13mins but when i switched it to an encrypted database the runtime jumped to a suspicious 17HOURS!!!

There must be something wrong but i cant figure what it could be. All the tests are being made with 100K operation count and 10M itens count, and im rebooting the system after each run. Would appreciate some help figuring this one out


Solution

  • YCSB makes no encryption per se but relies on the java driver of MongoDB. Have you tried the documentation of MongoDB?

    Which type of encryption are you using?

    I don't find your result that surprising. According to your question, your workload file looks like:

    recordcount=10000000
    operationcount=100000
    readproportion=0
    updateproportion=0
    scanproportion=0.95
    insertproportion=0.05
    requestdistribution=zipfian
    maxscanlength=100
    scanlengthdistribution=uniform
    

    This is a very intensive scan workload. First, scans are the slowest operations on column stores. Second, assuming it takes 250 ms for encryption and 400 ms for decryption, both the client and the REST server have to do it for each operation so it will take: (0.25 + 0.4)*100000 seconds, i.e. about 18 hours.

    EDIT

    According to your comments, you are using AES256 and comparing Workloads A and E. Workload A is about 50 % reads and 50 % writes. If you're using the standard row size of YCSB, each row represent 1 kB (10 fields, 100 B each).

    So, for 100k operations, you are manipulating the following amounts of data:

    • Workload A: 100000*0.5*1kB + 100000*0.5*1kB = 100 MB
    • Workload E: 100000*0.95*100*1kB + 100000*0.05*1kB = 9505 MB because your scans represent 100 rows!

    Since AES is distributive, i.e. AES(A + B) = AES(A) + AES(B), you encrypt 95 times more data with workload E, which explains the time difference.