Search code examples
c++encryptionlicensinglicense-key

How to prevent a file from being tampered with


I want to store confidential data in a digitally signed file, so that I know when its contents have been tampered with.

My initial thought is that the data will be stored in NVPs (name value pairs), with some kind of CRC or other checksum to verify the contents.

I am thinking of implementing the creating (i.e. writing) and verification (reading) of such a file, using ANSI C++.

Assuming this is the data I want to store:

    //Unencrypted, raw data to be stored in file
    struct PrivateInfo {
         double age; weight;
         FitnessScale fitness;
         Location  loc;
         OtherStuff stuff;
    };

    //128-bit Encrypted Data (Payload to be stored in file)
    struct EncryptedData {
     // unknown fields/format ??

    };

[After I have read a few responses to this question]

Judging by the comments I have received so far, I fear people are getting side tracked by the word "licensing" which seems to be a red flag to most people. I suspected that may be the case, but in todays atmosphere of heightened security and general nervousness, I thought I'd better detail what I needed to be "hiding" lest someone thought I was thinking of passing on the "Nuke password" to some terrorists or something. I will now remove the word "license" from my question.

View it more as a technical question. Imagine I am a student (which I am), and that I am trying to find out about recommended (or best practices) for encoding information that needs to be secure.

Mindful of the above, I will reformat my questions thus:

  1. Given a struct of different data type fields, what is the "recommended" algorithm to give it a "reasonable secure" encryption (I still prefer to use 128 bit - but thats just me)
  2. What is a recommended way of providing a ROBUST check on the encrypted data, so I can use that check value to know if the contents of the file (the Payload of encrypted data) differs from the original.?

Solution

  • First, note that "signing" data (to notice when it has been tampered with) is a completely separate and independent operation from "encrypting" data (to prevent other people from reading it).

    That said, the OpenPGP standard does both. GnuPG is a popular implementation: http://www.gnupg.org/gph/en/manual.html

    Basically you need to:

    • Generate a keypair, but don't bother publishing the public part.
    • Sign and encrypt your data (this is a single operation in gpg)
    • ... storage ...
    • Decrypt and check the signature (this is also a single operation).

    But, beware that this is only any use if you can store your private key more securely than you store the rest of the data. If you can't guarantee the security of the key, then GPG can't help you against a malicious attempt to read or tamper with your data. And neither can any other encryption/signing scheme.

    Forgetting encryption, you might think that you can sign the data on some secure server using the private key, then validate it on some user's machine using the public key. This is fine as far as it goes, but if the user is malicious and clever, then they can invent new data, sign it using their own private key, and modify your code to replace your public key with theirs. Their data will then validate. So you still need the storage of the public key to be tamper-proof, according to your threat-model.

    You can implement an equivalent yourself, something along the lines of:

    • Choose a longish string of random characters. This is your key.
    • Concatenate your data with the key. Hash this with a secure hash function (SHA-256). Then concatenate the resulting hash with your data, and encrypt it using the key and a secure symmetric cipher (AES).
    • ... storage ...
    • Decrypt the data, chop off the hash value, put back the key, hash it, and compare the result to the hash value to verify that it has not been modified.

    This will likely be faster and use less code in total than gpg: for starters, PGP is public key cryptography, and that's more than you require here. But rolling your own means you have to do some work, and write some of the code, and check that the protocol I've just described doesn't have some stupid error in it. For example, it has potential weaknesses if the data is not of fixed length, which HMAC solves.

    Good security avoids doing work that some other, smarter person has done for you. This is the virtuous kind of laziness.