Search code examples
c#asp.netbytefilestreamdocx

How to convert office file to image


I am searching from last two days but did not find any thing.

My requirement is to create a document viewer in my web application (C#.Net) and I don't want to use any third party tool for this. Can I convert the files in image or PDF or in any common formate which can be easly render on web page. I also can not use Introp object.

Any help will be highly appreciated


Solution

  • You mention in one of your comments that you'd like to write all the code yourself but don't know where to start. Here's how I would go about it...

    First, you'll need to familiarize yourself with the Microsoft Office Format specification. You can find that here (there's a link to the technical specification). Office documents are actually a .zip file with an XML file inside along with any binary data representing attachments. Just renamed a .docx file as .zip and you'll be able to open it up and see the XML and any other supporting documents inside (same is true for xlsx, etc...).

    Then you'll need to become intimately familiar with either PDF or HTML, as your job now will be to convert the various Office document structure into PDF or HTML structure, being sure to respect page layout, margins, order, etc...

    As others have said, this is a large task which is why third party tools exist today. Also, each third party toolset has it's limitation as this is really hard to "get right" in all situations and there will be edge cases that work for one document and not another (because maybe they didn't use Microsoft Word to save the .docx, maybe they used OpenOffice and OpenOffice interpreted the standard slightly differently...)