Search code examples
database-performancecqlcassandra-2.0nodetool

Understanding "Number of keys" in nodetool cfstats


I am new to Cassandra, in this example i am using a cluster with 1 DC and 5 nodes and a NetworkTopologyStrategy with replication factor as 3.

   Keyspace: activityfeed
            Read Count: 0
            Read Latency: NaN ms.
            Write Count: 0
            Write Latency: NaN ms.
            Pending Tasks: 0
                    Table: feed_shubham
                    SSTable count: 1
                    Space used (live), bytes: 52620684
                    Space used (total), bytes: 52620684
                    SSTable Compression Ratio: 0.3727660543119897
                    Number of keys (estimate): 137984
                    Memtable cell count: 0
                    Memtable data size, bytes: 0
                    Memtable switch count: 0
                    Local read count: 0
                    Local read latency: 0.000 ms
                    Local write count: 0
                    Local write latency: 0.000 ms
                    Pending tasks: 0
                    Bloom filter false positives: 0
                    Bloom filter false ratio: 0.00000
                    Bloom filter space used, bytes: 174416
                    Compacted partition minimum bytes: 771
                    Compacted partition maximum bytes: 924
                    Compacted partition mean bytes: 924
                    Average live cells per slice (last five minutes): 0.0
                    Average tombstones per slice (last five minutes): 0.0

What does Number of keys here mean? I have 5 different nodes in my cluster, and after firing the below command on each node separately i get different statistic for the same table.

nodetool cfstats -h 192.168.1.12 activityfeed.feed_shubham

As per the output above i can interpret that cfstats gives me stats regarding the physical storage of data on each node.

And i went through the below doc http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFstats.html But i did not find the explanation for number of keys in there.

I am using a RandomPartitioner.

Is this key anything to do with the Partition key?

I have around 200000 record in my table.


Solution

  • The number of keys represents the number of partition keys on that node for the table. Its just an estimate though, and based on your version of C* its more accurate. Before 2.1.6 it summed the number of partitions listed in index file per sstable. Afterwards it merges a sketch of the data (hyperloglog) thats stored per sstable.