What is the best approach / design to query a remote data table in Power BI when multiple following queries depend on the queried data table to prevent the remote data table to queried multiple times?
Details
Let's say I have a remote data base server my_db_server
with a database my_db
which I query for a specific data table my_source_data_table
. Afterwards I am using two other queries in PowerBI which depend on the initial query of the remote data table. I want to execute the query of the remote table only ONCE, and then use the cached data in an efficient way for the other two queries to prevent querying the remote database server multiple times.
Step 1 - Access remote database
Hence I add the following query in PowerBI to access the remote table
let
Source = Sql.Databases("<my_db_server>"),
Database = Source{[Name="<my_db>"]}[Data],
MySourceDataTable = Database {[Schema="dbo",Item="<my_source_datatable>"]}[Data],
in
MySourceDataTable
Step 2 - Depending queries
I now have two sub queries, which use this table as input and query for them, e.g.
Query A - Order columns by Foo,Bar
let
Source = MySourceDataTable,
ReorderColumnsFooBar = Table.ReorderColumns(Source ,{"Foo", "Bar"})
in
ReorderColumnsFooBar
Query B - Order columns by Bar, Foo
let
Source = MySourceDataTable,
ReorderColumnsBarFoo = Table.ReorderColumns(Source ,{"Bar", "Foo"})
in
ReorderColumnsBarFoo
How does PowerBI proceed in this case? Is it like:
Query remote database to resolve MySourceDataTable
Resolve ReorderColumnsFooBar
using cached MySourceDataTable
Resolve ReorderColumnsBarFoo
using cached MySourceDataTable
Does it use a cached version of 1) or are steps 2) and 3) each executing query 1) again?
Can I improve the situation using a Table.Buffer(MySourceDataTable)
at the end of 1) to explicitly cache the table?
Testing these different approaches didn't seem to bring a boost in speed. Currently my whole query takes about 90sec and I would like to speed it up, since the remote query takes up 80% of the time.
Power Query streams data each time from the source. For instance, if you have two queries which both get data from a DB and the data changes in the split second between the execution of query 1 and query 2, you will get different data extracted.
Table.Buffer only caches within a single query so to answer your question, steps 2 & 3 both execute query 1 again.
You can read more background here: https://bengribaudo.com/blog/2020/08/26/5417/how-power-query-thinks
If you want to improve your query time (90 seconds isn't long), you could take more advantage of query folding by reordering your steps. You could also optimise the M that is being run in each query - transformation choice and the order of operations can have a big impact on performance.