Search code examples
excelxml

Excel to XML for data stripping


I am trying to strip data from thousands of identical Excel 2007/2010 files. I would prefer to do this using scraping techniques. Is it possible to scrape an Excel file since, as far as I know, the file is basically some sort of XML format.

So, is it possible to convert an Excel file to XML or some other markup format?


Solution

  • The XLSX format is actually a ZIP file, but with a different extension. If you unzip it using your favorite zip program, you'll find that the worksheet data is located inside xl\worksheets. Each worksheet is saved as a separate XML document. You should be able to use XSLT as Michael suggested to extract the data you require.