We have a need to convert MS Office documents to PDF real time when someone provides a link to a document after checking whether user is authorized to view the document or not for an intranet portal. We also need to cache the documents based on the last modified date of the document, we should not convert the document again if another user requests the same document and the document content is not modified since it was last converted.
I have some basic questions on how we can implement this - and would like to check if anyone has previous experience or thoughts how they see this implemented?
For example, if we choose J2EE as the technology, and choose one of the open source Java libraries for PDF conversion; I have following questions.
Thanks
I work for a company that creates a product that does exactly what you are trying to do using Java / .NET Web service calls, so let me see if I can answer your questions without bias.
The whole document will need to be downloaded as it will need to be interpreted before PDF Conversion (e.g. for page numbering purposes) can take place. I am sure you are just giving an example, but 100MB is very large for an MS-Office document, although we do see it from time to time.
You can implement caching based on your exact security requirements. If you don't want to store the converted files in a (secured) DB or file system then perhaps you want to store them on a different server behind a firewall. Depending on the number of documents and size you anticipate you may want to cache them in memory. I am sure there are many J2EE caching libraries available, I know there are plenty in .NET. Just keep the most frequently requested documents in your cache.
Depending on your budget you may go for an out of the box product (hint hint :-). I know there are free libraries available for Java that leverage Open Office, but you get the same formatting limitations when opening MS-Office Files in OO. Be careful when trying to do your own MS-Office integration / automation. It is possible to make it reliable and scalable (we did), but it takes a long time and a lot of work.
I hope this helps.