Search code examples
sqldatabasepowerbipowerquerym

Prevent remote database to be queried multiple times in PowerBI


What is the best approach / design to query a remote data table in Power BI when multiple following queries depend on the queried data table to prevent the remote data table to queried multiple times?


Details

Let's say I have a remote data base server my_db_server with a database my_db which I query for a specific data table my_source_data_table. Afterwards I am using two other queries in PowerBI which depend on the initial query of the remote data table. I want to execute the query of the remote table only ONCE, and then use the cached data in an efficient way for the other two queries to prevent querying the remote database server multiple times.


Step 1 - Access remote database

Hence I add the following query in PowerBI to access the remote table

let
    Source = Sql.Databases("<my_db_server>"),
    Database = Source{[Name="<my_db>"]}[Data],
    MySourceDataTable = Database {[Schema="dbo",Item="<my_source_datatable>"]}[Data],
in
    MySourceDataTable 

Step 2 - Depending queries

I now have two sub queries, which use this table as input and query for them, e.g.

Query A - Order columns by Foo,Bar

let
    Source = MySourceDataTable,
    ReorderColumnsFooBar = Table.ReorderColumns(Source ,{"Foo", "Bar"})
in
    ReorderColumnsFooBar

Query B - Order columns by Bar, Foo

let
    Source = MySourceDataTable,
    ReorderColumnsBarFoo = Table.ReorderColumns(Source ,{"Bar", "Foo"})
in
    ReorderColumnsBarFoo 

How does PowerBI proceed in this case? Is it like:

  1. Query remote database to resolve MySourceDataTable

  2. Resolve ReorderColumnsFooBar using cached MySourceDataTable

  3. Resolve ReorderColumnsBarFoo using cached MySourceDataTable

Does it use a cached version of 1) or are steps 2) and 3) each executing query 1) again? Can I improve the situation using a Table.Buffer(MySourceDataTable) at the end of 1) to explicitly cache the table?

Testing these different approaches didn't seem to bring a boost in speed. Currently my whole query takes about 90sec and I would like to speed it up, since the remote query takes up 80% of the time.


Solution

  • Power Query streams data each time from the source. For instance, if you have two queries which both get data from a DB and the data changes in the split second between the execution of query 1 and query 2, you will get different data extracted.

    Table.Buffer only caches within a single query so to answer your question, steps 2 & 3 both execute query 1 again.

    You can read more background here: https://bengribaudo.com/blog/2020/08/26/5417/how-power-query-thinks

    If you want to improve your query time (90 seconds isn't long), you could take more advantage of query folding by reordering your steps. You could also optimise the M that is being run in each query - transformation choice and the order of operations can have a big impact on performance.