Search code examples
amazon-web-servicespysparkpowerbiamazon-athenadirectquery

How do I replace AWS Athena in my PowerBI DirectQuery structure?


I have built 14 AWS Glue jobs (using Pyspark) that retrieve data from a few sources, partition the data by a "uuid" field and write them to S3 in parquet files to be consumed through AWS Athena.

The goal of this pipeline is to make data available for some PowerBI reports that can be quite complicated. We are currently using DirectQuery, as we wanna scale our operation even more, and that'll grant us less headache when pushing hotfixes/changes in our reports.

The problem is that this DirectQuery connection using AWS Athena is looking very slow. And I really mean it. The Athena Connector in PowerBI is basically an ODBC.

I am looking for any options that would enable us to use PowerBI DirectQuery, leverage the Glue jobs I created and still have the performance we want in our reports.

Disclaimer: I know Microsoft has released a "Microsoft Fabric" feature that allows us to build data treatment pipelines and a Lakehouse, just as I did with AWS Glue & Athena, but their prices are just insane, and I'm not willing to pay that much for such poor capacity.

Currently, I tried optimizing our queries, partitioning our data and optimizing our DAX metrics, as they're the time-consumers in the process now. But still, I feel like Athena could be something that's holding us back.


Solution

  • To be honest, I just left Athena and got a dedicated server. Issue resolved!