Search code examples
clickhouseuser-managementclickhouse-client

Clickhouse row policy and default user behavior


Problem: Created User with role and row policies do not see data from all shards unless default user is granted the same row policies. Setup is the following: I have clickhouse cluster, 3 shards(2 replicas each shard).

 <access_management>1</access_management>

is turned on to enable user creation. Embedded click-house keeper is enabled.


CREATE USER admin_user GRANTEES ANY ON CLUSTER '{cluster}' IDENTIFIED WITH plaintext_password BY 'admin';
CREATE ROLE admin_role ON CLUSTER '{cluster}';
GRANT admin_role TO admin_user ON cluster '{cluster}';
GRANT ON CLUSTER '{cluster}' SELECT ON testdb.* TO admin_role; 

CREATE ROW POLICY IF NOT EXISTS  
  admin_rp_local_table ON CLUSTER '{cluster}' ON testdb.local_table
USING 1 TO admin_role;

CREATE ROW POLICY IF NOT EXISTS  
  admin_rp_distributed_table ON CLUSTER '{cluster}' ON testdb.distributed_table
USING 1 TO admin_role;

Is executed to create user,role and row policy.

After inserting data into distributed table and trying to select it from distributed table as admin user I get result from one shard(current).

I gave row policies to default user and admin user starts to see data from all shards.

CREATE ROW POLICY IF NOT EXISTS  
  default_rp_local_table ON CLUSTER '{cluster}' ON testdb.local_table
USING 1 to default;

CREATE ROW POLICY IF NOT EXISTS  
  default_rp_distributed_table ON CLUSTER '{cluster}' ON testdb.distributed_table
USING 1 to default;

Why granting row policies to default user affects shards visibility of admin user in a cluster? (Data from all shards will be in a result set)

Added example of the situation:

We create table 'local_table' with columns id, name. Created distributed_table from local_table. Insert following data via distributed_table: (1:Bob),(2:Fred),(3:Greg). First shard local table has (1:Bob). Second shard local table has (2:Fred). Third shard local table has (3:Greg).

Added admin_role, added admin_user, granted select for admin_user. Now default and admin_user see data - (1:Bob),(2:Fred),(3:Greg) when selected from distributed table.

If we add row policy to admin_user now and make select from distributed table: default sees no data(expected, local and distributed) and admin sees only - (1:Bob) (data from local table but should see data from all shards).

if now we add the same row policy but for DEFAULT user: default see (1:Bob),(2:Fred),(3:Greg) admin see (1:Bob),(2:Fred),(3:Greg). Why so? is it a bug?


Solution

  • Embedded click-house keeper is enabled. This was the root of the problem. In cluster setup Clickhouse keeper uses credentials to connect to other shards(when we make query via distributed table) in order to get the data. So when we make a query from distributed table the query to local shard is executed as a current user(the one we used to connect to database), then keeper is using default user to query other shards. That is why in my example after setting row policies for admin user and making a query as that user only first(local) shard data is shown. And that is why if we set row policies for default user the query for admin user works correctly. Creating a new user with all 'permissions' only for keeper may be a solution for this situation. Moral of the story - I should read the documentation of the tools I am using more carefully).

    user | username that will be used to authenticate to the cluster instances |default
    
    password | password for the user define to allow connections to cluster instances | ClickHouse123!
    

    https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper