Search code examples
c#excel-interopnpoi

Reading images from excel (with row and cell position) using excel introp or npoi


I need to extract images from excel with start&&end row no. and start && end col no

My current code is as below:-

  var excelApp = new Application();
  var wb = excelApp.Workbooks.Open(filePath, Type.Missing, Type.Missing, 
          Type.Missing, Type.Missing, Type.Missing, Type.Missing,
          Type.Missing, Type.Missing, Type.Missing, Type.Missing, 
          Type.Missing, Type.Missing, Type.Missing, Type.Missing);
  var ws = (Worksheet)wb.Worksheets["Sheet1"];
   int startCol = 0;
   int startRow =0;
   int endCol = 0;
   int endRow = 0;
  foreach (var pic in ws.Pictures())
                    {
                        int startCol = pic.TopLeftCell.Column;
                        int startRow = pic.TopLeftCell.Row;
                        int endCol = pic.BottomRightCell.Column;
                        int endRow = pic.BottomRightCell.Row;
                     }

Above code works fine when all images are different but when I put the same image in different cells than it picks only first one.

For example, Works fine when i put abc.jpeg at B1 cell and xyz.jpeg at C5 cell then results are two object first startRow=1,endRow=1,startCol=1,endCol=1 and second startRow=5,endRow=5,startCol=2,endCol=2

But if I put abc.jpeg at B1 cell and C5 cell then result is one object with startRow=1,endRow=1,startCol=1,endCol=1 for both images.It doesn't pic the second image.

Why it's happing?Is there any solution using interop or npoi


Solution

  • TL;DR - NPOI behaves the same as Excel Interop, returning one image when the same image is added twice. It likely does so for the same reason. EPPlus (the last test in this post) handles this scenario the way you would expect, identifying both instances of the picture separately and returning their positions on the worksheet.

    I tried first with NPOI. I created a workbook and inserted the same picture into the first sheet in two locations.

    -----
    |   |
    -----
    
        -----
        |   |
        -----
    

    Using NPOI

    using Microsoft.VisualStudio.TestTools.UnitTesting;
    using NPOI.XSSF.UserModel;
    using System.IO;
    
    namespace ExcelImageTests
    {
        [TestClass]
        public class NpoiExcelImages
        {
            [TestMethod]
            public void FindsTwoDistinctImagesInFile()
            {
                XSSFWorkbook workbook;
                using (var file = new FileStream(@"C:\Users\path-to-my-file\sotest.xlsx", 
                    FileMode.Open, FileAccess.Read))
                {
                    workbook = new XSSFWorkbook(file);
                }
                var pictures = workbook.GetAllPictures();
                Assert.AreEqual(2, pictures.Count);
            }
        }
    }
    

    This is as far as I got with NPOI. The test fails. NPOI counts one picture, not two. What's odd is that it also has no reference to shapes, pictures, or drawings at the sheet level. The picture returned is of type XSSFPictureData and contains the binary data for the picture. It doesn't refer to the relationship between the worksheet and the picture. I suspect that's why it's only returning one. There's one image embedded twice.

    To confirm I added another picture distinct from the first two. Now the test passed. There are three pictures visible on the sheet but two distinct pictures returned by GetAllPictures().

    You mentioned Interop and NPOI, but another option is EPPlus. It's more commonly used, and after a few minutes with NPOI I can see why. NPOI returns a lot of object types, just like Excel Interop, and you have to know what they are so you can cast them to those types.

    EPPlus is just a whole lot better. Here's the same test with EPPlus:

    using Microsoft.VisualStudio.TestTools.UnitTesting;
    using OfficeOpenXml;
    using System.IO;
    
    namespace ExcelImageTests
    {
        [TestClass]
        public class EPPlusExcelImages
        {
            [TestMethod]
            public void FindsTwoDistinctImagesInFile()
            {
                var file = new FileInfo(@"C:\Users\path-to-my-file\sotest.xlsx");
                using (var package = new ExcelPackage(file))
                {
                    var workbook = package.Workbook;
                    var sheet = workbook.Worksheets[1];
                    Assert.AreEqual(2, sheet.Drawings.Count)
                    var drawingOne = sheet.Drawings[0];
                    var drawingTwo = sheet.Drawings[1];
                    // From returns the position of the upper left corner of the picture.
                    // To returns the position of the lower right corner.
                    Assert.IsTrue(drawingOne.From.Row < drawingTwo.From.Row);
                    Assert.IsTrue(drawingOne.From.Column < drawingTwo.From.Column);
                    Assert.IsTrue(drawingOne.To.Row < drawingTwo.To.Row);
                    Assert.IsTrue(drawingOne.To.Column < drawingTwo.To.Column);
                }
            }
        }
    }
    

    This test passes. It detects two images, and correctly tells me their relative positions. You don't have my worksheet, but I checked and the rows and columns are all correct.

    One odd detail is that the worksheet index is 1-based but the rows and columns are 0-based. But that's no big deal.

    Also, while all of the objects returned from the package are IDisposable, most examples only show disposing the package itself. One person noted that the Dispose methods for the other objects are empty. That's weird. But it's still better then Excel Interop where you have to release COM objects.