Search code examples
c#com-interopoffice-interop

Writing a C# method to parse string for Word Text how to iterate to get forms too?


Okay first off I would say I have next to no experience in using COM references so just playing around with Microsoft.Office.Interop.Word so please keep that in mind if my code looks a little overblown. Essentially I want to get a string or object to be put into a database to parse or else run logic on the objects directly in C# and put parts of them into a database.

The essence of the problem is I am not sure using the library how you iterate to current form field in a given paragraph reference. Please examine code below and let me know if you have any suggestions. I think I just don't know the proper property or method to give me an iteration to help me finish my code.

Basically I create a Word App, create Word Doc that references opening the Word App, I iterate through the paragraphs in the doc, I break up the blocks into character arrays. In my code when the ASCII character equivalent of 21 comes up I know that is a type of form box I am wanting to parse. However I can't get the iteration to increment successfully which is strange, no matter where I set the scope of the int and the increment set it never changes scope. Thus I am at a loss and was curious if there was a better way to do what I was doing as well. There may be a simpler way to do what I am doing. I know for me to solve the issue I could potentially make a method to just return the Form objects seperate from paragraph objects. But that seems strange to me so I figured I would ask.

I am using .NET 4.5 and had to manually add the DLL for Microsoft.Office.Interop.Word ver 15 as the reference for some reason when we upgraded to Office 2013 did not update the references directly in VS. Here is my rather messy code:

public static string ReadTest(string loc)
        {
            Word._Application wordApp = new Word.Application();
            Word._Document Doc = wordApp.Documents.Open(loc, ReadOnly: true);

            try
            {
                sb = "";

                // This will get me JUST THE FORMS info
                //foreach (Word.FormField form in Doc.FormFields)
                //{
                //    sb += form.Result + "\n";
                //}


                int x = 1;

                foreach (Word.Paragraph objParagraph in Doc.Paragraphs)
                {
                    string st = "";

                    try
                    {
                        foreach (char c in objParagraph.Range.Text)
                        {
                            if (((int)c) != 21)
                            {
                                st += c;
                            }
                            else
                            {
                                st += Doc.FormFields.get_Item(x).Result;
                            }
                        }

                        sb += st + "\n";

                    }
                    catch (Exception ex)
                    {
                        throw ex;
                    }

                    x += 1;

                }

            }
            catch (COMException) { }
            finally
            {
                //FileInfo finfo = new FileInfo(loc);
                //finfo.IsReadOnly = false;

                if (Doc != null)
                {
                    Doc.Close();
                    Doc = null;
                }
                if (wordApp != null)
                {
                    wordApp.Quit(Word.WdSaveOptions.wdDoNotSaveChanges);
                    wordApp = null;
                }
            }

            GC.Collect();
            GC.WaitForPendingFinalizers();

            return sb;
        }

Solution

  • For sure the right position for the x increment is the line immediately after the one where you are accessing the form field:

                            else
                            {
                                st += Doc.FormFields.get_Item(x).Result;
                                x++;
                            }
    

    I don't know if you already tried to put it there, but surely the code you posted could work only when your doc has exactly one form field for each paragraph.

    In fact, if there is more than one field, when you find the 2nd, 3rd, and so on 21 character x wouldn't have been incremented yet, so you'd end up reading always the same field.

    If for instance there are only a field in the first paragraph, and another one in the third, your code would find the first one, then increment x and go on to read the second paragraph with x = 2 without finding any 21 char. Then x would be incremented once again and you would scan the characters of the third paragraph with x = 3 when there are only 2 fields, and so when you get the 21 char you look for a field (the third one) that doesn't exist.

    PS It would be much simpler helping you working with a sample doc