I am setting up sentry on Kubernetes with clickhouse distributed table. All component of the deployment seem to be working except sentry-snuba-outcomes-consumer
which is throwing exception
snuba.clickhouse.errors.ClickhouseWriterError: Method write is not supported by storage Distributed with more than one shard and no sharding key provided (version 21.8.13.6 (official build))
I don't know I have looked around the config and docs but does not seem to understand how to provide the sharding key. I am completely new to Clickhouse and the Snuba.
This is my current clickhouse config
config.xml: |-
<?xml version="1.0"?>
<yandex>
<path>/var/lib/clickhouse/</path>
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
<format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>
<include_from>/etc/clickhouse-server/metrica.d/metrica.xml</include_from>
<users_config>users.xml</users_config>
<display_name>sentry-clickhouse</display_name>
<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<interserver_http_port>9009</interserver_http_port>
<max_connections>4096</max_connections>
<keep_alive_timeout>3</keep_alive_timeout>
<max_concurrent_queries>100</max_concurrent_queries>
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
<mark_cache_size>5368709120</mark_cache_size>
<timezone>UTC</timezone>
<umask>022</umask>
<mlock_executable>false</mlock_executable>
<remote_servers>
<sentry-clickhouse>
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>sentry-clickhouse-0.sentry-clickhouse-headless.NAMESPACE.svc.cluster.local</host>
<port>9000</port>
<user>...</user>
<compression>true</compression>
</replica>
</shard>
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>sentry-clickhouse-1.sentry-clickhouse-headless.NAMESPACE.svc.cluster.local</host>
<port>9000</port>
<user>...</user>
<compression>true</compression>
</replica>
</shard>
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>sentry-clickhouse-2.sentry-clickhouse-headless.NAMESPACE.svc.cluster.local</host>
<port>9000</port>
<user>...</user>
<compression>true</compression>
</replica>
</shard>
</sentry-clickhouse>
</remote_servers>
<zookeeper incl="zookeeper-servers" optional="true" />
<macros incl="macros" optional="true" />
<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>
<max_session_timeout>3600</max_session_timeout>
<default_session_timeout>60</default_session_timeout>
<disable_internal_dns_cache>1</disable_internal_dns_cache>
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
<query_thread_log>
<database>system</database>
<table>query_thread_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_thread_log>
<distributed_ddl>
<path>/clickhouse/task_queue/ddl</path>
</distributed_ddl>
<logger>
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
</yandex>
This is the settings for Snuba
import os
from snuba.settings import *
env = os.environ.get
DEBUG = env("DEBUG", "0").lower() in ("1", "true")
# Clickhouse Options
SENTRY_DISTRIBUTED_CLICKHOUSE_TABLES = True
CLUSTERS = [
{
"host": env("CLICKHOUSE_HOST", "sentry-clickhouse"),
"port": int(9000),
"user": env("CLICKHOUSE_USER", "..."),
"password": env("CLICKHOUSE_PASSWORD", "..."),
"database": env("CLICKHOUSE_DATABASE", "..."),
"http_port": 8123,
"storage_sets": {
"cdc",
"discover",
"events",
"events_ro",
"metrics",
"migrations",
"outcomes",
"querylog",
"sessions",
"transactions",
"profiles",
"functions",
"replays",
"generic_metrics_sets",
"generic_metrics_distributions",
"search_issues",
"generic_metrics_counters",
"spans",
"group_attributes",
},
"single_node": False,
"cluster_name": "sentry-clickhouse",
"distributed_cluster_name": "sentry-clickhouse",
"sharding_key": "cdc",
},
]
Please can someone help me understand how to specify the sharding_key?
I inserted the "sharding_key": "cdc",
and SENTRY_DISTRIBUTED_CLICKHOUSE_TABLES = True
as shown above just to see if it solve the problem, but the error persist. Clickhouse and Snuba documentation does not seem to be very clear how to specify the sharing_key in configuration.
I had exactly the same - Take a look at this: https://github.com/sentry-kubernetes/charts/issues/1042
You will need to
I used this:
clickhouse-client --host 127.0.0.1
CREATE OR REPLACE TABLE metrics_raw_v2_dist AS metrics_raw_v2_local ENGINE = Distributed('sentry-clickhouse', default, metrics_raw_v2_local, timeseries_id);
After that - restart the metrics consumer and you should see all the data flowing in.