Search code examples
powershellazure-storage

Uploading large csv to table storage with PowerShell (not sequential)


I need an advice how to upload a large csv file (about one minion lines) into a Azure table storage with PowerShell. I am aware about Add-AzTableRow -table $using:table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty

https://github.com/paulomarquesc/AzureRmStorageTable/tree/master/docs

Based on tests with a smaller file with 10 000 lines it takes about 10 min to upload, processing sequentially. (It should be about 16-20 hours for 1 000 000 lines)

I have tried the smaller file (10 000 lines) with PowerShell 7 to use Foreach-Object -Parallel, but the funny thing is that it takes about 3 times more. From three tests, two uploaded 10 000 lines of file for about 30 mins, one for 47 mins.

Just for comparison, I took less than an hour to upload the one minion lines file with Storage Explorer! So I was wondering what is the process they are using and could it be used with PowerShell?

I have reviewed the following article:

https://blog.kloud.com.au/2019/02/05/loading-and-querying-data-in-azure-table-storage-using-powershell/

Which should be working, but it returns an error with: $table.CloudTable.ExecuteBatch($batchOperation)

So my question would be: Is there any way to load data into Azure table storage in parallel?

As requested, adding the code used.

Note: Code works just fine, but it takes time and I believe It could be faster. Looking for suggestions how to improve.

$SaContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
$tableName = 'Table1000000'
New-AzStorageTable –Name $tableName –Context $SaContext
$table = (Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable
$CSVTable = Import-Csv -Path C:\TEMP\DEMO_CSV\Test-1000000-20200513223059.csv
$i=0
$totalcount = $CSVTable.count

foreach ($entry in $CSVTable) {
    $i++
    Write-Progress -Activity "Importing Data" -Status "Item: $i of $totalcount" -PercentComplete (($i/$totalcount)*100);
    $partitionKey = $entry.ID
    $rowKey = $entry.UserName
    $tableProperty = @{
        'SISID' = $entry.SISID
        'FirstName' = $entry.FirstName
        'LastName' = $entry.LastName
        'Password' = $entry.Password
        'StateID' = $entry.StateID
        'StudentNumber' = $entry.PersonalNumber
        'MiddleName' = $entry.MiddleName
        'Grade' = $entry.Grade
        'Status' = $entry.Status
        'BirthDate' = $entry.BirthDate
        'GradYear' = $entry.GradYear
    }
    Add-AzTableRow -table $table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty
}

Solution

  • According to the script you provide, you use the command Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable to get a CouldTable Object. Its type is Microsoft.Azure.Cosmos.Table.CloudTableClient. enter image description here So if we want to execute batch operations with the client, we need to set the type of batch operations as Microsoft.Azure.Cosmos.Table.TableBatchOperation. Besides, please note that all entities in a batch must have the same PartitionKey. If your entities have different PartitionKeys, they need to be in separate batches

    For example

    [Microsoft.Azure.Cosmos.Table.TableBatchOperation] $batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
    
    $e = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test1")
    $e.Properties.add("Age","20")
    $e1 = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test2")
    $e1.Properties.add("Age","25")
    $batchOperation.InsertOrReplace($e)
    $batchOperation.InsertOrReplace($e1)
    $table.CloudTable.ExecuteBatchAsync($batchOperation)
    
    

    enter image description here