I am using XML::Twig
to parse output of Azure's list-blob
REST API.
In particular, I am looking to identify and delete uncommitted orphan blobs, and I am unsure as to how best go about using XML::Twig
efficiently to do this. I don't even know where to start.
Ultimately I need to retrieve the <Name>
element of the orphaned blobs.
The Azure documentation states:
Uncommitted Blobs in the Response
Uncommitted blobs are listed in the response only if the include=uncommittedblobs parameter was specified on the URI. Uncommitted blobs listed in the response do not include any of the following elements:
Last-Modified Etag Content-Type Content-Encoding Content-Language Content-MD5 Cache-Control Metadata
Therefore, in the following simplified example, you can see an orphan blob called "test" because the <Blob></Blob>
block does not contain any of the above elements.
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my**account.blob.core.windows.net/"
ContainerName="testonly">
<Blobs>
<Blob>
<Name>test</Name>
<Properties>
<Content-Length>0</Content-Length>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>
UPDATE :
Actually, I might have oversimplified. The accepted answer does not appear to work with the below, it prints everything :
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my**account.blob.core.windows.net/" ContainerName="testonly">
<Blobs>
<Blob>
<Name>data/users/docx</Name>
<Properties>
<Last-Modified>Wed, 10 May 2017 20:21:25 GMT</Last-Modified>
<Etag>0x8D497E221E7A5AF</Etag>
<Content-Length>125632</Content-Length>
<Content-Type>application/octet-stream</Content-Type>
<Content-Encoding/>
<Content-Language/>
<Content-MD5/>
<Cache-Control/>
<Content-Disposition/>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
<Blob>
<Name>test</Name>
<Properties>
<Content-Length>0</Content-Length>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>
My code :
sub blob_parse {
my $blob = $_;
$blob->first_child($_) and return
for qw( Last-Modified Etag Content-Type Content-Encoding
Content-Language Content-MD5 Cache-Control Metadata);
say "orph: ".$blob->first_child('Name')->text;
}
sub parseAndDelete {
### ORPHAN
$twig_handlers = {'Blobs/Blob' => \&blob_parse};
$twig = new XML::Twig(twig_handlers=>$twig_handlers);
$twig->parse($message);
}
Just create a handler for Blob
, do nothing if any of the elements is present, otherwise print the name. Use the first_child
method to inspect the internal structure of a blob.
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use XML::Twig;
my $xml = '...';
my $twig = 'XML::Twig'->new(twig_handlers => {
Blob => sub {
my $properties = $_->first_child('Properties');
$properties->first_child($_) and return
for qw( Last-Modified Etag Content-Type Content-Encoding
Content-Language Content-MD5 Cache-Control Metadata
);
say $_->first_child('Name')->text;
},
});
$twig->parse($xml);