We have a .Net application that gets huge result set (5,000,000+ rows) from Kusto, process in memory, and load them into Azure Cosmos DB.
We are using IDataReader to avoid loading all data into memory at the same time. However, we find that before reading the first record from IDataReader, all results have already been loaded to memory. Is there a way to do true streaming for result data set?
using (ICslQueryProviderclient = KustoClientFactory.CreateCslQueryProvider(connectionString))
{
string query = @;
client.ExecuteQuery("PageViewEvents | // some aggregation logic...", new ClientRequestProperties());
// At this point, all results have already been loaded to memory. That takes 2GB memory!
while (reader.Read())
{
// Load current record to Azure Cosmos DB
}
}
streaming
property as part of your connection string: https://learn.microsoft.com/en-us/azure/kusto/api/connection-strings/kusto#client-communication-propertiesstreaming
option doesn't help you, you could 'split' your currently-single query into multiple ones (according to a specific column in the data, e.g. a datetime
column, or ingestion_time()
, or using hash()
)