Search code examples
powershellazure-data-lakeazure-data-lake-gen2

Get-AzDataLakeStoreChildItemSummary vs Get-AzDataLakeGen2ChildItem


I am trying to calculate ADLS Gen1 folder size and ADLS Gen2 Container size using powershell.

ADLS Gen1 folder = ADLS Gen2 Container --- > Means both Gen1 and Gen2 data is same.

In ADLS Gen1 folder size calculation script took less than a minute and same data in Gen2 taking an hour.

Need help how to get good performance in ADLS Gen2 for calculating containers size details.

Do we have any command in ADLS Gen2 which is equivalent to below ADLS Gen1 command ?

Get-AzDataLakeStoreChildItemSummary

Verified below MS document but could not see equivalent command in ADLS Gen2.

https://learn.microsoft.com/en-us/powershell/module/az.storage/?view=azps-9.2.0

Using below powershell script for ADLS Gen1 folder size details:

$adlsactname = "xxxxxx"

$ADLSPaths = "/xxxx/xxx/xxxx"

foreach($ADLSPath in $ADLSPaths){

$ADLS_Size_Path = Get-AzDataLakeStoreChildItemSummary -Account $adlsactname -Path $ADLSPath -Concurrency 128

$string = [pscustomobject]@{
"adlspath" = $ADLSPath
"directoryCount" = $ADLS_Size_Path.directoryCount
"fileCount" = $ADLS_Size_Path.fileCount
"adlssizeinBYTES" = $ADLS_Size_Path.length
"adlsSizeinGB" = ($ADLS_Size_Path.length)/1024/1024/1024
"adlsSizeinTB" = ($ADLS_Size_Path.length)/1024/1024/1024/1024
}
$string
}

Output from ADLS Gen1:

adlspath        : /xxxxx/xxxxx/xxxxx
directoryCount  : 169596
fileCount       : 170170
adlssizeinBYTES : 6680568860813
adlsSizeinGB    : 6221.76459134836
adlsSizeinTB    : 6.07594198373863

Using Below script for ADLS Gen2 container size details:

$StorageAccountName = "xxxxxx"

$actContext = New-AzStorageContext -StorageAccountName $StorageAccountName

$ListContainers = Get-AzStorageContainer -Context $actContext.Context -Name "xxxxxx"

$Counter = $ListContainers.Count

foreach($ListContainer in $ListContainers)
{
    write-host ‘Checking container = ‘($Counter) $ListContainer.Name ‘’
    $containerName = $ListContainer.Name
    $Token = $Null
    do{
        $Files = Get-AzDataLakeGen2ChildItem -Context $actContext -FileSystem $containerName -Recurse -ContinuationToken $Token
        if($Files.Length -le 0) { Break;}
        $Token = $Files[$Files.Count -1].ContinuationToken;
      }
    While ($Token -ne $Null)
    $FilesList = $Files | Where {$_.IsDirectory -eq $false}
    $DirList = $Files | Where {$_.IsDirectory -eq $true}
    $FileCount = $FilesList.count
    $DirCount = $DirList.count
    $Total = $Files | Measure-Object -Property Length -Sum
    $string = [pscustomobject]@{
                                "storageAccountName" = $StorageAccountName
                                "containerName" = $containerName
                                "directoryCount" = $DirCount
                                "fileCount" = $FileCount
                                "containersizeinBYTES" = $Total.Sum
                                "adlsSizeinGB" = ($Total.Sum)/1024/1024/1024
                                "adlsSizeinTB" = ($Total.Sum)/1024/1024/1024/1024
                                }
    $string 
    $Counter--
}

Solution

  • Need help how to get good performance in ADLS Gen2 for calculating containers size details. Do we have any command in ADLS Gen2 which is equivalent to below ADLS Gen1 command ?

    After reproducing from my end, I could able to achieve your requirement using Get-AzDataLakeGen2ChildItem. Below is the complete code that worked for me.

    $ctx = New-AzStorageContext -StorageAccountName '<STORAGEACCOUNTNAME>' -StorageAccountKey '<STORAGEACCOUNTKEY>'
    
    $filesystemName = "<CONTAINERNAME>"
    $ListOfFiles = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $filesystemName -Recurse -FetchProperty
    
    $DirectoryCount=0
    $containersizeinBYTES=0
    $adlsSizeinGB=0
    $adlsSizeinTB=0
    
    foreach($file in $ListOfFiles){
        if($file.IsDirectory){
            $DirectoryCount=$DirectoryCount+1}
        $adlssizeinBYTES=$adlssizeinBYTES+$file.Length}
    
    $string =  [PSCustomObject]@{
        containerName = $filesystemName
        directoryCount = $DirectoryCount
        fileCount = $ListOfFiles.Count - $DirectoryCount
        containersizeinBYTES = $adlssizeinBYTES
        adlsSizeinGB = ($adlssizeinBYTES)/1024/1024/1024
        adlsSizeinTB = ($adlssizeinBYTES)/1024/1024/1024/1024
    }
    
    $string
    

    RESULTS:

    enter image description here