I have a dataflow that my source is a Kusto query. This source has over 100M rows which I am partition using a hash
Kusto function from a ForEach
activity.
In my Kusto query, one of the where
clauses is to filter on the SubscriptionId
field from a blob that is under my ADLSv2.
But, instead of using that solution hardcoded on my code, I would like to get those subscriptions from another Kusto query as another source (image below), but without using the lookup
or join
steps, because I need to filter the data in the source directly instead of projecting and then filtering. Otherwise, it might break/too much memory, etc.
So, is it possible to filter in my main Kusto query in a where clause from the results of ANOTHER kusto source (aka CPCDevices)?
This is my dataflow:
This is my main Kusto query as example:
Table
| project column1, column2, column3
| where SubscriptionId in ({CPCDevices})
//| another things on my query here..
The CPCDevices
from the where clause is coming from the CPCDecices source
UPDATE
After adding another source, this is the dataflow updated with the QueryOutput derived columns as suggested in this solution.
You can try Cache sink here. But I haven't tested this for large data.
This is my Table1 sample data:
Id Name Age
1 Rakesh 22
2 Laddu 22
3 Virat 35
4 ABD 40
5 MSD 42
Table2 sample data:
Table2Id Marks
1 94
3 100
5 98
Here I am projecting the columns Id,Name
from Table1
with a condition of Id in Table2Id
.
First add a cache sink to the Table2 source.
Now, use the cache sink outputs
array in the source query of Table1
like below.
"Table1| project Id,Name| where Id in (dynamic({map(sink1#outputs(),#item.Table2Id)}))"
Make sure your write order is 1
for the cache sink.
Result of Table1 source:
If the above approach is not working for the large data, then joins might be the best option here. First join the two tables in source KQL script itself and then project from the result table.