Search code examples
javajavascriptdocxdoc

Batch converting doc/docx to pdf using Javascript


I'm working on a Java program that programmatically converts .doc- and .docx-files to pdf. I've tested several different ways to convert .doc- and .docx-files to pdf such as using several open source Java libraries, sadly these libraries would often mess up the layout in the documents.

I've stumbled upon a javascript script to use the underlying Microsoft Word instance to open the file and save it as a PDF (found at: https://superuser.com/questions/17612/batch-convert-word-documents-to-pdfs-free/28303#28303):

var fso = new ActiveXObject("Scripting.FileSystemObject");
var docPath = WScript.Arguments(0);
var pdfPath = WScript.Arguments(1);
docPath = fso.GetAbsolutePathName(docPath);
var objWord = null;
try{
    WScript.Echo("Saving '" + docPath + "' as '" + pdfPath + "'...");
    objWord = new ActiveXObject("Word.Application");
    objWord.Visible = false;
    var objDoc = objWord.Documents.Open(docPath);
    var wdFormatPdf = 17;
    objDoc.SaveAs(pdfPath, wdFormatPdf);
    objDoc.Close();
    WScript.Echo("The CV was succesfully converted.");
} catch(err){
    WScript.Echo("An error occured: " + err.message);
}finally{
    if (objWord != null){
        objWord.Quit();
    }
}

This javascript-script is called from my Java program synchronously for each document.

On a small scale this seems to work great, but when dealing with a lot of documents like several thousands, I encountered a couple of problems:

  • Sometimes one Word process would hang at the 'Save as'-prompt, if this happened user intervention was needed to continue. Until any user interaction the process would just block.
  • Sometimes the Word process would hang at a 'Bookmark'-prompt. The process is also blocked until any user intervention to pass the prompt.

I'm looking for the best/cleanest way to somehow control these Word processes better by giving them a deadline or something. Like giving them 5 seconds to open the Word document and save it as a PDF, after 5 seconds the process would be killed if still active.

I've dealt with something similiar in the past and the solution for that included a 'kill word processes batch script' to kill any WORD processes that were stuck after the program ended. Not very clean but it did its job.

Any experiences or ideas would be appreciated!


Solution

  • I managed to get around the issue related to the process being stuck at a prompt in Microsoft Word. In my final solution I altered my Java code to make it start the Javascript script in a separate Thread. My main Thread would then sleep for a few seconds and would then check the other Thread.

    The other Thread keeps a reference to the Process instance it uses to run the Javascript-script. The main Thread would then check the exitValue of that process, if the script would be stuck at a Microsoft Word prompt a IllegalThreadStateException would be thrown. I would then handle the Exception by killing the process and cleaning up any temporary files left by Microsoft Word.