Apache Pinot server component consumes unexpected amount of memory

Problem Description:

Docker is used to deploy Apache Pinot on production servers (VMs).

Pinot's official documentation has been followed for this purpose.

What has been done

Pinot servers consume more memory than the data and replication factor we have.

The things has been tried were the followings:

  • Defining Xms and Xmx flags for JVM in ‍JAVA_OPTS environment variables
  • Setup monitoring on machines in order to gain the observability
  • Remove the indices (like inverted index) from the table definition

Table size

Node exporter screenshot

System Specification: we have 3 servers, 2 controllers and 2 brokers with the following specifications:

  • 24 core CPU
  • 64 gigabytes of Memory
  • 738 gigabytes of SSD disk

Sample Docker-compose file on one of the servers:

version: '3.7'
    image: apachepinot/pinot:0.11.0
    command: "StartServer -clusterName bigdata-pinot-ansible -zkAddress, -configFileName /server.conf"
    restart: unless-stopped
    hostname: server1
    container_name: server1
      - "8096-8099:8096-8099"
      - "9000:9000"
      - "8008:8008"
      JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx20G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"
      - ./server.conf:/server.conf
      - ./data/server_data/segment:/var/pinot/server/data/segment
      - ./data/server_data/index:/var/pinot/server/data/index

table config:

    "tableName": "<table-name>",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "schemaName": "<schema-name>",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "60",
      "replication": "3",
      "timeColumnName": "date",
      "allowNullTimeValue": false,
      "replicasPerPartition": "3",
      "segmentPushType": "APPEND",
      "completionConfig": {
        "completionMode": "DOWNLOAD"
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant",
      "tagOverrideConfig": {
        "realtimeCompleted": "DefaultTenant_OFFLINE"
    "tableIndexConfig": {
      "noDictionaryColumns": [
      "rangeIndexColumns": [
      "rangeIndexVersion": 1,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [
      "bloomFilterColumns": [],
      "loadMode": "MMAP",
      "onHeapDictionaryColumns": [],
      "varLengthDictionaryColumns": [],
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false
    "metadata": {},
    "routing": {
      "instanceSelectorType": "strictReplicaGroup"
    "query": {},
    "fieldConfigList": [],
    "upsertConfig": {
      "mode": "FULL",
      "hashFunction": "NONE"
    "ingestionConfig": {
      "streamIngestionConfig": {
        "streamConfigMaps": [
            "streamType": "kafka",
            "": "<topic-name>",
            "": "<kafka-brokers-list>",
            "stream.kafka.consumer.type": "lowlevel",
            "": "smallest",
            "": "",
            "": "",
            "stream.kafka.decoder.prop.format": "JSON",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.time": "1h",
            "realtime.segment.flush.segment.size": "300M"
    "isDimTable": false

server.conf file:


After ingesting data from real-time stream (Kafka in our case) the data grows in the memory and the containers faced to OOMKilled error: server components status issue

We have no clue about what is happening on the server, would someone help us finding the root cause of this problem?

P.S. 1: For following the complete process of how the Pinot is deployed you can see this repository on github.

P.S. 2: It is known that the size of data in Pinot can be calculated using the following formula:

Data size = size of data (retention) * retention period * replication factor

For example if we have data with retention of 2d (two days), and each day we have approximately 2 gigabytes of data, and the replication factor equals to 3, the data size is about 2 * 2 * 3 = 12 gigabytes


  • As it is described in the question, the problem is with creating the table not the Apache Pinot itself. Apache Pinot keeps the keys for Upsert operation on heap. In order to scale the performance, it is required to increase the Kafka partitions. Based on the documentation, the default upsert mode is equals to None.