I need an advice how to upload a large csv file (about one minion lines) into a Azure table storage with PowerShell.
I am aware about Add-AzTableRow -table $using:table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty
https://github.com/paulomarquesc/AzureRmStorageTable/tree/master/docs
Based on tests with a smaller file with 10 000 lines it takes about 10 min to upload, processing sequentially. (It should be about 16-20 hours for 1 000 000 lines)
I have tried the smaller file (10 000 lines) with PowerShell 7 to use Foreach-Object -Parallel
, but the funny thing is that it takes about 3 times more. From three tests, two uploaded 10 000 lines of file for about 30 mins, one for 47 mins.
Just for comparison, I took less than an hour to upload the one minion lines file with Storage Explorer! So I was wondering what is the process they are using and could it be used with PowerShell?
I have reviewed the following article:
Which should be working, but it returns an error with: $table.CloudTable.ExecuteBatch($batchOperation)
So my question would be: Is there any way to load data into Azure table storage in parallel?
As requested, adding the code used.
Note: Code works just fine, but it takes time and I believe It could be faster. Looking for suggestions how to improve.
$SaContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
$tableName = 'Table1000000'
New-AzStorageTable –Name $tableName –Context $SaContext
$table = (Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable
$CSVTable = Import-Csv -Path C:\TEMP\DEMO_CSV\Test-1000000-20200513223059.csv
$i=0
$totalcount = $CSVTable.count
foreach ($entry in $CSVTable) {
$i++
Write-Progress -Activity "Importing Data" -Status "Item: $i of $totalcount" -PercentComplete (($i/$totalcount)*100);
$partitionKey = $entry.ID
$rowKey = $entry.UserName
$tableProperty = @{
'SISID' = $entry.SISID
'FirstName' = $entry.FirstName
'LastName' = $entry.LastName
'Password' = $entry.Password
'StateID' = $entry.StateID
'StudentNumber' = $entry.PersonalNumber
'MiddleName' = $entry.MiddleName
'Grade' = $entry.Grade
'Status' = $entry.Status
'BirthDate' = $entry.BirthDate
'GradYear' = $entry.GradYear
}
Add-AzTableRow -table $table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty
}
According to the script you provide, you use the command Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable
to get a CouldTable
Object. Its type is Microsoft.Azure.Cosmos.Table.CloudTableClient
.
So if we want to execute batch operations with the client, we need to set the type of batch operations as Microsoft.Azure.Cosmos.Table.TableBatchOperation
. Besides, please note that all entities in a batch must have the same PartitionKey. If your entities have different PartitionKeys, they need to be in separate batches
For example
[Microsoft.Azure.Cosmos.Table.TableBatchOperation] $batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
$e = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test1")
$e.Properties.add("Age","20")
$e1 = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test2")
$e1.Properties.add("Age","25")
$batchOperation.InsertOrReplace($e)
$batchOperation.InsertOrReplace($e1)
$table.CloudTable.ExecuteBatchAsync($batchOperation)