Search code examples
htmlperlpdfsolarisdoc

perl doc/pdf/xls to HTML convertor


I would like to convert files with extensions doc/docx/xls/xlsx/pdf to HTML files. Is there any way to do that in a simple way on Solaris using Perl?


Solution

  • The perl libraries I've used for processing Microsoft Office files have been pretty lacking, and I have yet to find ones that do a good job of handling the Office 2007 and Office 2010 extensions (please point to one in the comments if you know of one!)

    If you have a PC running Microsoft Office, you can use win32ole to control the Office app from unix. I've done it before with Ruby: http://rubyonwindows.blogspot.com/2007/03/automating-excel-with-ruby.html

    Here's a perl module for using win32 OLE: http://metacpan.org/pod/Win32::OLE

    I personally don't recommend the OLE approach because it has lots of headaches (like you have to leave Office running on the PC for the unix script to work, Windows Firewall will almost randomly block the unix script as your PC gets updated with patches).

    I haven't tried this, but here's a java program that will use OpenOffice and GhostScript to do batch conversions for you: http://www.codeproject.com/KB/java/PDFCM.aspx