Search code examples
amazon-web-servicesamazon-redshiftamazon-aurora

AWS Redshift Federated Query to MySQL is slower than when executing directly in Mysql


Hlleo guys. I am new to Redshift.

I am using Redshift Federated Query to get data from my Aurora Mysql. I have table TbTest on Mysql like:

CREATE TABLE `TbTest` (
  `colA` varchar(50) NOT NULL,
  `colB` varchar(50) NOT NULL,
  `colC` varchar(2) DEFAULT NULL,
  PRIMARY KEY (`colA`,`colB`)
)

When I execute query in Mysql :

SELECT
        *
    FROM
        TbTest AS t
    WHERE
        t.colA= 'ValueA'
        AND t.colB= 'ValueB'

I can get the data in almost 1ms only. (I check the execution plan, it is using Primary Index)

But when I try the same SQL on Redshift. It takes around 5 seconds to get that record. And below is the execution plan.

XN MySQL Query Scan "TbTest"  (cost=0.00..9533.38 rows=16 width=1324)
  ->  Remote MySQL Seq Scan test."TbTest"  (cost=0.00..9533.22 rows=16 width=1324)
        Filter: ((("colA")::text = 'ValueA'::text) AND (("colB")::text = 'ValueB'::text))

I saw that there is a Seq Scan instead of using index of table. How could I use the index of my Mysql table when executing it in Redshift.

2023/12/06 Addition information:

I can get the SQL from SVL_FEDERATED_QUERY:

SELECT "column1" FROM test."a" WHERE ( BINARY rtrim( CAST("key1" AS char)) =  BINARY rtrim(CAST( 'value1' AS char))) AND (( BINARY rtrim( CAST("key2" AS char)) =  BINARY rtrim(CAST( 'value2' AS char))) OR ( BINARY rtrim( CAST("key2" AS char)) =  BINARY rtrim(CAST( 'value3' AS char)))) LIMIT 10000 

So I decided to create an index look like below. But the cost of query is not changed.

KEY "idx_Test_binary" ((BINARY rtrim( CAST("key1" AS char))),(BINARY rtrim( CAST("key2" AS char))),"column1")

Please help me.


Solution

  • You need to check what is the exact query that is been executed on MySql, there is a monitoring view called SVL_FEDERATED_QUERY that display the query. In federated queries sent to PostgreSQL redshift add an Rtrim function to varchar columns thus preventing an index seek, the solution was to add a corespondent index on PostgreSQL with an Rtrim, resulting an index seek and getting really good performance ( there is off course a price with adding index but it depends if you can “afford“ it). Keep in mind that the performance of federated query can never be the same as the source table as redshift need to run the query, get the result over the network, insert it to a table within redshift and select from that table. But from our usages in that feature I can say that all the above process is happening very fast.