Search code examples
pythonencryptiondata-integritytampering

Python: encryption as means to prevent data tampering


Many of my company's clients use our data acquisition software in a research basis. Due to the nature of research in general, some of the clients ask that data is encrypted to prevent tampering -- there could be serious ramifications if their data was shown to be falsified.

Some of our binary software encrypts output files with a password stored in the source, that looks like random characters. At the software level, we are able to open up encrypted files for read-only operations. If someone really wanted to find out the password so that they could alter data, it would be possible, but it would be a lot of work.

I'm looking into using Python for rapid development of another piece of software. To duplicate the functionality of encryption to defeat/discourage data tampering, the best idea I've come up with so far is to just use ctypes with a DLL for file reading/writing operations, so that the method of encryption and decryption is "sufficiently" obfuscated.

We are well aware that an "uncrackable" method is unattainable, but at the same time I'm obviously not comfortable with just having the encryption/decryption approaches sitting there in plain text in the Python source code. A "very strong discouragement of data tampering" would be good enough, I think.

What would be the best approach to attain a happy medium of encryption or other proof of data integrity using Python? I saw another post talking about generating a "tamper proof signature", but if a signature was generated in pure Python then it would be trivial to generate a signature for any arbitrary data. We might be able to phone home to prove data integrity, but that seems like a major inconvenience for everyone involved.


Solution

  • As a general principle, you don't want to use encryption to protect against tampering, instead you want to use a digital signature. Encryption gives you confidentiality, but you are after integrity.

    Compute a hash value over your data and either store the hash value in a place where you know it cannot be tampered with or digitally sign it.

    In your case, it seems like you want to ensure that only your software can have generated the files? Like you say, there cannot exist a really secure way to do this when your users have access to the software since they can tear it apart and find any secret keys you include. Given that constraint, I think your idea of using a DLL is about as good as you can do it.