Search code examples
c#vb.netms-wordupgradedocx

Is there a way to upgrade word documents to 2010


Scenario: I have about 14000 word documents that need to be converted from "Microsoft Word 97 - 2003 Document" to "Microsoft Word Document". In other words upgraded to 2010 format (.docx).

Question: Is there an easy way to do this using API's or something?

Note: I've only been able to find a microsoft program that converts the documents to .docx but they still open in compatability mode. It would be nice if they could just be converted to the new format. Same functionality you get when you open an old document and it gives you the option to convert it.

Edit: Just found http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document.convert.aspx looking into how to use it.

EDIT2: This is my current function for converting the documents

Private Sub btnConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnConvert.Click
    FolderBrowserDialog1.ShowDialog()
    Dim mainThread As Thread
    If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
        lstFiles.Clear()

        DirSearch(FolderBrowserDialog1.SelectedPath)
        ThreadPool.SetMaxThreads(1, 1)
        lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
        TextBox1.Text += "Conversion started at " & DateTime.Now().ToString & Environment.NewLine
        For Each x In lstFiles
            ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ConvertDoc), x)
        Next

    End If
End Sub
Private Sub ConvertDoc(ByVal path As String)
    Dim word As New Microsoft.Office.Interop.Word.Application
    Dim doc As Microsoft.Office.Interop.Word.Document
    word.Visible = False

    Try
        Debug.Print(path)
        doc = word.Documents.Open(path, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing)
        doc.Convert()

    Catch ex As Exception
        ''do nothing
    Finally
        doc.Close()
        word.Quit()
    End Try

End Sub`

It lets me select a path then find all doc files within the subfolders. That code isn't important, all the files for conversion are in lstFiles. Only problem at the moment is that it takes a really long time to convert even just 10 documents. Should I be using one word application per document instead of reusing it? Any suggestions?

Also it opens word after about 2 or 3 conversions and starts flashing but keeps converting.

EDIT3: Tweaked to code above a little bit and it runs cleaner. Takes 1min10sec to convert 8 files though. Considering I have 14000 I need to convert this method will take a reasonably long time.

EDIT4: Changed the code up again. Uses a threadpool now. Seems to run a bit faster. Still need to run on a better computer to convert all the documents. Or do them slowly by folder. Can anyone think of any other way to optimize this?


Solution

  • I ran your code locally, with just some minor edits for improved tracing and timing, and it "only" took 13.73 seconds to do 12 files. That would take care of your 14,000 in about 4 hours. I'm running Visual Studio 2010 on Windows 7 x64 with a dual core processor. Perhaps you can just use a faster computer?

    Here's my full code, this is just a form with a single button, Button1, and a FolderBrowserDialog, FolderBrowserDialog1:

    Imports System.IO
    
    Public Class Form1
    
    Dim lstFiles As List(Of String) = New List(Of String)
    
    Private Sub DirSearch(path As String)
    
    
        Dim thingies = From file In Directory.GetFiles(path) Where file.EndsWith(".doc") Select file
    
        lstFiles.AddRange(thingies)
    
        For Each subdir As String In Directory.GetDirectories(path)
            DirSearch(subdir)
        Next
    End Sub
    
    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        FolderBrowserDialog1.ShowDialog()
    
        If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
            lstFiles.Clear()
    
            DirSearch(FolderBrowserDialog1.SelectedPath)
            Dim word As New Microsoft.Office.Interop.Word.Application
            Dim doc As Microsoft.Office.Interop.Word.Document
            lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
            Dim startTime As DateTime = DateTime.Now
            Debug.Print("Timer started at " & DateTime.Now().ToString & Environment.NewLine)
            For Each x In lstFiles
                word.Visible = False
                Debug.Print(x + Environment.NewLine)
                doc = word.Documents.Open(x)
                doc.Convert()
                doc.Close()
            Next
            word.Quit()
            Dim endTime As DateTime = DateTime.Now
            Debug.Print("Took " & endTime.Subtract(startTime).TotalSeconds & " to process " & lstFiles.Count & " documents" & Environment.NewLine)
        End If
    
    End Sub
    End Class