We wish to make a desktop application that searches a locally packaged text database that will be a few GB in size. We are thinking of using lucene.
So basically the user will search for a few words and the local lucene database will give back a result. However, we want to prevent the user from taking a full text dump of the lucene index as the text database is valuable and proprietary. A web application is not the solution here as the Customer would like for this desktop application to work in areas where the internet is not available.
How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?
One way of doing this, we thought, was if the lucene index could be stored on an encrypted file system within a file (something like truecrypt). So the desktop application would "mount" the file containing the lucene indexes.
And this needs to be cross platform (Linux, Windows)...We would be using Qt or Java to write the desktop application.
Is there an easier/better way to do this?
[This is for a client. Yes, yes, conceptually this is bad thing :-) but this is how they want it. Basically the point is that only the Desktop application should be able to access the lucene index and no one else. Someone pointed that this is essentially DRM. Yeah, it resembles DRM]
The problem here is that you're trying to both provide the user with data and deny it from em, at the same time. This is basically the DRM problem under a different name - the attacker (user) is in full control of the application's environment (hardware and OS). No security is possible in such situation, only obfuscation and illusion of security.
While you can make it harder for the user to get to the unencrypted data, you can never prevent it - because that would mean breaking your app. Probably the closest thing is to provide a sealed hardware box, but IMHO that would make it unusable.
Note that making a half-assed illusion of security might be sufficient from a legal standpoint (e.g. DMCA's anti-circumvention clauses) - but that's outside SO's scope.