Search code examples
c#.netlistazureazure-storage

receive a large amount of data from Azure Storage and process it


I need to migrate some data from Azure Storage to Sql db.

I have the following code :

class AzureDataAccessManager : IAzureDataAccessManager
{
    private readonly CloudTable tableClient;

    private readonly CloudStorageAccount storageAccount;

    public string TableName { get; }

    public AzureDataAccessManager(string connectionString, string tableName)
    {
        TableName = tableName ?? throw new ArgumentNullException(nameof(tableName));

        if (connectionString == null) throw new ArgumentNullException(nameof(connectionString));

        storageAccount = CloudStorageAccount.Parse(connectionString);

        tableClient = storageAccount.CreateCloudTableClient().GetTableReference(TableName);
    }

    public List<T> QueryAllRecords<T>() where T : class, ITableEntity, new()
    {
        TableContinuationToken token = null;

        var entities = new List<T>();
        do
        {
            var queryResult = tableClient.ExecuteQuerySegmented(new TableQuery<T>(), token);
            entities.AddRange(queryResult.Results);
            token = queryResult.ContinuationToken;

        } while (token != null);

        return entities;
    }
}

And I am getting all the records like this :

var result = azureTableManager.QueryAllRecords<AzureCpaDataEntity>();

The problem is I don't know, how many rows I'll have there. What if it will be too large? Maybe get via some ranges (10 thousands or whatever), but as I see there is no respective method in List.

Help me with some solutions or ideas, please!

Thanks!


Solution

  • The question's code already retrieves results in batches. Instead of waiting for all of them to arrive, the method could be turned into an iterator and return each batch immediatelly :

    public IEnumerable<List<T>> QueryRecords<T>() where T : class, ITableEntity, new()
    {
        TableContinuationToken token = null;
    
        do
        {
            var queryResult = tableClient.ExecuteQuerySegmented(new TableQuery<T>(), token);
            token = queryResult.ContinuationToken;
            yield return queryResult.Results;
    
        } while (token != null);
    }
    

    The results should be processed in batches as well :

    foreach(var batch in QueryRecords<AzureCpaDataEntity>())
    {
        ProcessTheBatch(batch);
    }