Search code examples
pythonamazon-ec2flaskflask-restfulcrate

Crate AMI Performance Lower With Flask-RESTful Endpoint on EC2


Launched a Simple Crate AMI EC2 Instance and opened up the ports for Crate on 4200 and 5000 for Flask.

When I run the EC2 instance with Crate AMI, the speeds are slower but still fast enough (~1-2 Second), but when I call the same with the Flask Endpoint which calls the Crate DB (on the same instance) by passing a query to it, it takes close to 10 seconds.

I tested the endpoint on a localhost and there was no change to the speed execution as such. Hence, I've ruled out the code being the problem.

My questions:

  • Why are the queries being run through the Flask-Restful endpoint on EC2 so slow?
  • Does it make a difference in speed performance to make an EC2 AMI from scratch and install CrateDB into it, than an out-of-the-box Crate AMI?

Solution

  • That can be one of several things, mostly however I suspect a 'hardware' issue:

    • Are the hardware specs the same? more cores, more memory, SSD vs spinning disks?
    • Is the environment variable CRATE_HEAP_SIZE set to half the available RAM? (/etc/sysconfig/crate)
    • Is the CREATE TABLE statement the same? A different number of cores result in a different number of shards if not specified. Oversharding/undersharding will degrade performance noticeably.

    I am assuming the table size and queries are the same ;) otherwise seemingly minor changes can make a difference in performance. Partitioned tables optimize if the partition column is in the WHERE clause, as well as queries hitting the primary key(s) directly are way faster. Similarly, aggregations/comparisons on Strings are slower than on numeric types etc.

    Cheers, Claus