Search code examples
c#azurepdfazure-function-app

Extract embedded files from azure blob in c#


I have embedded pdf files stored inside a blob file.I want to extract those file from my blob.

below are the thing I have done so far:

  1. I have made http trigger function app
  2. establish connection with the storage container
  3. able to fetch the blob.

get the embedded file I am using following code:

namespace PDFDownloader {
  public static class Function1 { [FunctionName("Function1")]
    public static async Task <IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req, ILogger log) {
      log.LogInformation($"GetVolumeData function executed at: 
       {DateTime.Now}");
      try {
        CloudStorageAccount storageAccount = CloudStorageAccount.Parse(Parameter.ConnectionString);
        CloudBlobClient cloudBlobClient = storageAccount.CreateCloudBlobClient();
        CloudBlobContainer cloudcontainer = cloudBlobClient.GetContainerReference(Parameter.SuccessContainer);

        BlobResultSegment resultSegment = await
        cloudcontainer.ListBlobsSegmentedAsync(currentToken: null);
        IEnumerable <IListBlobItem> blobItems = resultSegment.Results;

        string response = "";
        int count = 0;
        //string blobName = "";

        foreach(IListBlobItem item in blobItems) {
          var type = item.GetType();
          if (type == typeof(CloudBlockBlob)) {
            CloudBlockBlob blob = (CloudBlockBlob) item;
            count++;
            var blobname = blob.Name;
            // response = blobname;
            response = blob.DownloadTextAsync().Result;
            //response = blob.DownloadToStream().Result;
          }
        }

        if (count == 0) {
          return new OkObjectResult("Error : File Not Found !!");
        } else {
          return new OkObjectResult(Convert.ToString(response));
        }
      } catch(Exception ex) {
        log.LogError($ " Function Exception Message: {ex.Message}");
        return new OkObjectResult(ex.Message.ToString());
      } finally {
        log.LogInformation($"Function- ENDED ON : {DateTime.Now}");
      }
    }
  }

how can I read embedded files from my blob file response and send it to http?


Solution

  •              using Bytescout.PDFExtractor;
    
                var stream1 = await blob.OpenReadAsync(); //read your blob like 
                this
                attachmentExtractor extractor = new AttachmentExtractor();
                         extractor.RegistrationName = "demo";
                         extractor.RegistrationKey = "demo";
    
    
                         // Load sample PDF document
                         extractor.LoadDocumentFromFile(stream1);
    
                         for (int i = 0; i < extractor.Count; i++)
                         {
                             Console.WriteLine("Saving attachment: " + 
                             extractor.GetFileName(i));
                             // Save attachment to file
                             extractor.Save(i, extractor.GetFileName(i));
                             Console.WriteLine("File size: " + extractor.GetSize(i));
                         }
    
                         extractor.Dispose();*/