Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-endpointsapi-key

how to manage/design API access on GCP?


Let's say I have 3 datasets on Big Query -- Dataset A, Dataset B and Dataset C.

Also, I have 3 clients -- Client A, Client B and Client C.

And, I have a simple web app deployed in App Engine with an API, say, '/weather'. The API simply writes a query from the client's input and reads and writes on the datasets, using Big Query APIs, and returns the result.

Clients A, B and C have their own API key so that they can use the weather API.

But I want to restrict API access such that Client A can only access Dataset A, Client B can only access Dataset B and Client C can only access Dataset C.

But, if Client A wants to access Dataset B too, I would also want to be able to easily grant Client A access to Dataset B without having to re-deploy my app.

I've done a lot of reading on Cloud Endpoints, App Engine and Big Query, but I couldn't really find any solutions.

What is the best way to achieve this hopefully maybe at Cloud Endpoints level or App Engine level or Big Query level? If not, at back-end Python level (I am using Flask)

The last resort I can think of would be, I would have to create a simple dictionary in a DB where the key is the API key and the value is a list of datasets that it can access. So, when a client hits the endpoint with their own API key, I have to check and see whether the client has access to the dataset or not.

But that would be quite an expensive operation and I would like to take care of this at GCP level or back-end python level.

Please let me know if there is any features on GCP that can help me achieve this.


Solution

  • When you perform access control, you have 2 parts: Authentication and Authorization.

    Cloud Endpoint is a good solution if you want to secure your API with a weak authentication secret (API Key). I wrote an article on this.

    Here, with your 3 clients, you will authenticate only 3 projects (no USERS, only PROJECTS). You also have the APIkey value in the query param. But it's only authentication.

    If you want an authorization layer, to say WHO have access to WHAT, here, the client A has only access to the Dataset A, you have to code it by yourselves.

    In my company, we keep these data into Firestore: serverless, quick, free (up to 50k read per day)