Search code examples
c#ms-wordoffice-interopoffice-2003office-automation

How to get text from line number in MS Word


Is it possible to get text (line or sentence) from a given line number in MS Word using office automation? I mean its ok if I can get either the text in the given line number or the sentence(s) itself which is a part of that line.

I am not providing any code because I have absolutely no clue how an MS Word is read using office automation. I can go about opening the file like this:

var wordApp = new ApplicationClass();
wordApp.Visible = false;
object file = path;
object misValue= Type.Missing; 
Word.Document doc = wordApp.Documents.Open(ref file, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue);

//and rest of the code given I have a line number = 3 ?

Edit: To clarify @Richard Marskell - Drackir's doubt, though text in MS Word is a long chain of string, office automation does still let us know line number. In fact I get the line number itself from another piece of code, like this:

Word.Revision rev = //SomeRevision
object lineNo = rev.Range.get_Information(Word.WdInformation.wdFirstCharacterLineNumber);

For instance say the Word file looks like this:

fix grammatical or spelling errors

clarify meaning without changing it correct minor mistakes add related resources or links
always respect the original author

Here there are 4 lines.


Solution

  • Fortunately after some epic searching I got a solution.

        object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";
    
        Word.Application wordObject = new Word.ApplicationClass();
        wordObject.Visible = false;
    
        object nullobject = Missing.Value;
        Word.Document docs = wordObject.Documents.Open
            (ref file, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject);
    
        String strLine;
        bool bolEOF = false;
    
        docs.Characters[1].Select();
    
        int index = 0;
        do
        {
            object unit = Word.WdUnits.wdLine;
            object count = 1;
            wordObject.Selection.MoveEnd(ref unit, ref count);
    
            strLine = wordObject.Selection.Text;
            richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding
    
            object direction = Word.WdCollapseDirection.wdCollapseEnd;
            wordObject.Selection.Collapse(ref direction);
    
            if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
                bolEOF = true;
        } while (!bolEOF);
    
        docs.Close(ref nullobject, ref nullobject, ref nullobject);
        wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
        docs = null;
        wordObject = null;
    

    Here's the genius behind the code. Follow the link for some more explanation on how it works.