Peformance difference between YSQL vs YCQL

Based on the document "Benchmarking Distributed SQL Databases" What sees that throughput is higher in YCQL when compared with YSQL.

If we are using the same table structure and tool to insert would be the same and I am not using any SQL like features then why does YCQL perform better when compared with YSQL?

Solution

This could be because of a few differences between YCQL and YSQL. Note that while these differences not fundamental to the architecture, they manifest because YSQL started with the PostgreSQL code for the upper half of the DB. Many of these are being enhanced.

One hop optimization YCQL is shard-aware and knows how the underlying DB (called DocDB) shards and distributes data across nodes. This means it can “hop” directly to the node that contains the data when using PREPARE-BIND statements. YSQL today cannot do this since this requires a JDBC level protocol change, this work is being done in the jdbc-yugabytedb project.

Threads instead of processes YCQL uses threads to handle incoming client queries/statements, while the YSQL (and PostgreSQL code) uses processes. These processes are heavier weight, and this could affect throughput in certain scenarios (and connection scalability in certain others as well). This is another enhancement that is planned.

Upsert vs insert In YCQL, each insert is treated as an upsert (update or insert, without having to check the existing value) by default and needs special syntax to perform pure inserts. In YSQL, each insert needs to read the data before performing the insert since if the key already exists, it is treated as a failure.

More work gone into YCQL performance Currently (end of 2019) the focus has been only on correctness + functionality for YSQL, while YCQL performance has been worked on quite a bit. Note that while the work on performance has just started, it is possible to improve the performance relatively quickly because of the underlying architecture.