Search code examples
.netazure-data-explorer

Is there a way to stream results in Kusto Data .Net library?


We have a .Net application that gets huge result set (5,000,000+ rows) from Kusto, process in memory, and load them into Azure Cosmos DB.

We are using IDataReader to avoid loading all data into memory at the same time. However, we find that before reading the first record from IDataReader, all results have already been loaded to memory. Is there a way to do true streaming for result data set?

using (ICslQueryProviderclient = KustoClientFactory.CreateCslQueryProvider(connectionString))
{
    string query = @;
    client.ExecuteQuery("PageViewEvents | // some aggregation logic...", new ClientRequestProperties());
    // At this point, all results have already been loaded to memory. That takes 2GB memory!
    while (reader.Read())
    {
        // Load current record to Azure Cosmos DB
    }
}

Solution

    1. You may want to try specifying the streaming property as part of your connection string: https://learn.microsoft.com/en-us/azure/kusto/api/connection-strings/kusto#client-communication-properties
    2. Exporting 5M (raw?) records from Kusto/ADX to CosmosDB - what's the use case? are you running analytics/aggregations over this data in CosmosDB?
      • You could consider performing additional filtering/aggregation in your Kusto query prior to exporting to CosmosDB (again, depending on your use case)
      • If, for whatever reason, the streaming option doesn't help you, you could 'split' your currently-single query into multiple ones (according to a specific column in the data, e.g. a datetime column, or ingestion_time(), or using hash())