Search code examples
hashdatabase-securitysensitive-data

How to separate a person's identity from his personal data?


I'm writing an app which main purpose is to keep list of users purchases.

I would like to ensure that even I as a developer (or anyone with full access to the database) could not figure out how much money a particular person has spent or what he has bought.

I initially came up with the following scheme:

    --------------+------------+-----------
    user_hash     | item       | price
    --------------+------------+-----------
    a45cd654fe810 | Strip club |     400.00
    a45cd654fe810 | Ferrari    | 1510800.00
    54da2241211c2 | Beer       |       5.00
    54da2241211c2 | iPhone     |     399.00
  • User logs in with username and password.
  • From the password calculate user_hash (possibly with salting etc.).
  • Use the hash to access users data with normal SQL-queries.

Given enough users, it should be almost impossible to tell how much money a particular user has spent by just knowing his name.

Is this a sensible thing to do, or am I completely foolish?


Solution

  • The problem is that if someone already has full access to the database then it's just a matter of time before they link up the records to particular people. Somewhere in your database (or in the application itself) you will have to make the relation between the user and the items. If someone has full access, then they will have access to that mechanism.

    There is absolutely no way of preventing this.

    The reality is that by having full access we are in a position of trust. This means that the company managers have to trust that even though you can see the data, you will not act in any way on it. This is where little things like ethics come into play.

    Now, that said, a lot of companies separate the development and production staff. The purpose is to remove Development from having direct contact with live (ie:real) data. This has a number of advantages with security and data reliability being at the top of the heap.

    The only real drawback is that some developers believe they can't troubleshoot a problem without production access. However, this is simply not true.

    Production staff then would be the only ones with access to the live servers. They will typically be vetted to a larger degree (criminal history and other background checks) that is commiserate with the type of data you have to protect.

    The point of all this is that this is a personnel problem; and not one that can truly be solved with technical means.


    UPDATE

    Others here seem to be missing a very important and vital piece of the puzzle. Namely, that the data is being entered into the system for a reason. That reason is almost universally so that it can be shared. In the case of an expense report, that data is entered so that accounting can know who to pay back.

    Which means that the system, at some level, will have to match users and items without the data entry person (ie: a salesperson) being logged in.

    And because that data has to be tied together without all parties involved standing there to type in a security code to "release" the data, then a DBA will absolutely be able to review the query logs to figure out who is who. And very easily I might add regardless of how many hash marks you want to throw into it. Triple DES won't save you either.

    At the end of the day all you've done is make development harder with absolutely zero security benefit. I can't emphasize this enough: the only way to hide data from a dba would be for either 1. that data to only be accessible by the very person who entered it or 2. for it to not exist in the first place.

    Regarding option 1, if the only person who can ever access it is the person who entered it.. well, there is no point for it to be in a corporate database.