Filtering Datasets Based on OPA Policies

I have an access control model that is composed of two layers: Access Levels and Shared Permissions.

A user's Access Level governs both the maximum set of permissions that you can have in the system and it also grants you some basic permissions to create top level objects in the system (portfolios, programs, projects, etc..). Objects in the system can also be shared with you by someone else in the system (thus granting you one or more permissions specifically on an object). If an object was not created by you or has not been assigned to you, then you should be able to see it unless it's explicitly been shared with you. An example dataset would look something like this:

"access_levels": {
    "Worker": ["projects.view"],
    "Planner": ["projects.create", "projects.edit", "projects.view"]
},
"users_access_level": {
    "bob.jones@example.com": "Planner",
    "joe.humphreys@example.com": "Worker"
},
"resource_hierarchy": {
    "customer1": ["customer1"],
    "project1": ["project1", "customer1"],
    "project2": ["project2", "customer1"]
},
"resource_policyIDs": {
    "customer1": "1",
    "project1": "2",
    "project2": "3",
},
"policies": {
    "1": {
        "permissions": ["projects.create"],
        "subjects": ["users:joe.humphreys@example.com"]
    },
    "2": {
        "permissions": ["projects.view"],
        "subjects": ["users:joe.humphreys@example.com"]
    },
    "3": {}
}

and a policy would looks something like this:

package authz

default authorized = false

authorized {
    input.method == "POST"
    http_path = ["programs", "create"]
    input.customerId == token.payload.customerId
    permitted_actions(token.payload.sub, input.customerId)[_] == "programs.create"
}

subjects_resource_permissions(sub, resource) = { perm |         
    resource_ancestors := data.resource_hierarchy[resource]         
    ancestor := resource_ancestors[_]                           
    id := data.resource_policyIDs[ancestor]                         
    data.policies[id]["subjects"][_] == sprintf("users:%v", [sub])
    perm := data.policies[id]["permissions"][_]                     
}

permitted_actions(sub, resource) = x {
    resource_permissions := subjects_resource_permissions(sub, resource)
    access_permissions := data.access_levels[data.users_access_level[sub]]

    perms := {p | p := access_permissions}

    x = perms & resource_permissions
}

http_path := split(trim_left(input.path, "/"), "/")

Let's say I have created a Projects API to manage project resources. The API has a method to list projects, and the method should only return the projects the user has view access to. So in the example above, the user 'joe.humphreys@example.com' shouldn't be able to view Project 2 even though his access level gives him 'projects.view'. This is because it wasn't shared with him.

If I wanted to use OPA, how could I provide a general pattern to accomplish this across multiple APIs? What would the query look like to OPA to accomplish something like this? In other words, if I wanted to enforce this authorization in SQL what would that look like?

I've read this artcile, but I'm having a hard time seeing how it fits here..

Solution

I'm assuming your two-layer model ANDs the 'Access Level' with the 'Shared Permission'. E.g., "joe" can see "project1" because "joe" is a Worker therefore he has "projects.view" permission AND "joe" is assigned to "project1" (via policy "2") with permission "projects.view". Since "joe" is not assigned to "project2" via any policy with the "projects.view" permission, "joe" cannot see "project2". I.e., even if "joe" was assigned to "project2" via some policy, that policy must specify the "projects.view" permission otherwise the "joe" cannot see it.

You could write something like this to generate the set of project resources that the subject is allowed to see:

authorized_project[r] {

    # for some projects resource 'r', if...
    r := data.projects[_]

    # subject has 'projects.view' access, and...
    level := data.user_access_levels[input.sub]
    "projects.view" == data.access_level_permissions[level][_]

    # subject assigned to project resource (or any parents)
    x := data.resource_hierarchy[r.id][_]
    p := data.resource_policies[x]
    "projects.view" == data.policies[p].permissions[_]
    input.sub == data.policies[p].subjects[_]
}

This begs the question of how data.projects, data.policies, and data.resource_hierarchy get populated (I'm assuming the Access Level data sets are much smaller but there could also be the same question with those.) The blog post (which you linked) discusses answers to that question. Note, passing the data via input instead of data does not really change anything--it still needs to be available on every query.

You could refactor the example above and make it slightly more readable:

authorized_project[r] {
    r := data.projects[_]
    subject_access_level[[input.sub, "projects.view"]]
    subject_shared_permission[[input.sub, "projects.view", r.id]]
}

subject_access_level[[sub, perm]] {
    some sub
    level := data.user_access_levels[sub]
    perm := data.access_level_permissions[level][_]
}

subject_shared_permission[[sub, perm, resource]] {
    some resource
    x := data.resource_hierarchy[resource][_]
    p := data.resource_policies[x]
    perm := data.policies[p].permissions[_]
    sub := data.policies[p].subjects[_]
}

You could generalize the above as follows:

authorized_resource[[r, kind]] {
    r := data.resources[kind][_]
    perm := sprintf("%v.view", [kind])
    subject_access_level[[input.sub, perm]]
    subject_shared_permission[[input.sub, perm, r.id]]
}

subject_access_level[[sub, perm]] {
    some sub
    level := data.user_access_levels[sub]
    perm := data.access_level_permissions[level][_]
}

subject_shared_permission[[sub, perm, resource]] {
    some resource
    x := data.resource_hierarchy[resource][_]
    p := data.resource_policies[x]
    perm := data.policies[p].permissions[_]
    sub := data.policies[p].subjects[_]
}