sparql gremlin graph-databases tinkerpop amazon-neptune

Anonymous Authentication by default in Amazon Neptune?

I'm learning about Amazon Neptune, and noticed that:

IAM authentication is not enabled by default
IAM authentication requires AWS Signature v4 for API calls, which increases application complexity

By default, it seems that Amazon Neptune uses anonymous authentication, as I didn't have to provide any API keys, username / password combinations, or certificates for authentication. Additionally, the code sample provided by AWS doesn't include any authentication details.

It appears that the only default security options for Amazon Neptune are network-level VPC Security Groups.

According to the What is Neptune? documentation, the service claims to be "highly secure." In my opinion, a service that does not support application-level authentication by default, is not "highly secure."

Neptune provides multiple levels of security for your database. Security features include network isolation using Amazon VPC, and encryption at rest using keys that you create and control through AWS Key Management Service (AWS KMS). On an encrypted Neptune instance, data in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.

Question: Why does Amazon Neptune use an insecure configuration by default, and is there a way to enable authentication without using the complicated IAM integrated authentication?

Solution

You've got some very valid points in there, so let me go through them in detail by providing some context.

By default, it seems that Amazon Neptune uses anonymous authentication..

This is intentional for a reason. The query languages that Neptune support right now are Gremlin and SPARQL, both of which are built on top of HTTP/HTTPS without any sort of auth (Basic Auth is supported in Gremlin, but that is not something that clients use in production anyways. I'd need at least some form of message digest auth to call it secure, but unfortunately, the language spec does not include this). As these languages are open, there are a lot of open source client code that exist out there that assume that they are dealing with an unauthenticated endpoint. As a result, purely from an adoption point of view, Neptune chose to keep its request layer to be unauthenticated by default. If you explore other DB engines within AWS (say Aurora MySQL), the backing DB engine does support auth as its default posture.

This does not mean that it is the right thing to do, so I'll let someone from the Gremlin/SPARQL community comment on whether the spec should enforce authentication as the default posture or not.

It appears that the only default security options for Amazon Neptune are network-level VPC Security Groups.

SG's provide the network ACLs, and we do support TLS 1.2 by default (as of the newest engine versions), so that tightens up your client -> db connection as well.

The service claims to be "highly secure." In my opinion, a service that does not support application-level authentication by default, is not "highly secure."

In addition to the details called out above, the "highly secure" aspect of Neptune is not limited just to client -> db connection. Your data is replicated 6 way and stored in 3 AZs. This involves a lot of communication, not just from the DB, but within the backing storage nodes as well. All these communications are guarded by industry standard security protocols. Encryption at rest for the distributed store is a totally interesting case study on its own. Same standards apply to operator access to the machines, auditing, data safety which snapshotting and restoring etc etc. In short, I do agree that the default posture should be SigV4 (or some open standard) auth enabled, I do want to make sure that you do get some clarity on why we do claim to be a highly secure DB, much like any other product that AWS provides.

Is there a way to enable authentication without using the complicated IAM integrated authentication?

SigV4 is the standard that most AWS services do support. I do agree that it would have been a lot easier if there were an SDK that customers could directly use. We did vend out SigV4 plugins for some of the clients (especially Java and Python) and it actually has a pretty good uptake. Do try it out and share feedback on which areas in the integration seemed to be painful, and we'd be more than happy to take a look.

EDIT 1: The OP discussion here was around security between client and the database, so the security practices in the opaque backing data store that I've quoted above isn't really relevant. In other words, the discussion here is entirely around the data plane API of Neptune and whether we could be secure by default, rather than an opt in.