Search code examples
python-3.xpython-docx

Find a string by index in docx document with docx module


I am looking through a docx document using docx module and regex.

I have found the text immediately before the string I actually want to extract. How can I reference next string? Can I use the Index at all?

for table in wordDoc.tables:
    for row in table.rows:
        for cell in row.cells:
            #grabbing the Payment Total Amount
            if 'Total Payment Amount:' in cell.text:
                 print(cell.text)
                 print(cell.text.index)


Output:

Total Payment Amount:
<built-in method index of str object at 0x000001F9376D26C0>

Solution

  • Something like this should give you the idea:

    >>> text = "The quick brown fox"
    >>> key = "quick"
    >>> start = text.index(key)
    >>> start
    4
    >>> text[start:]
    'quick brown fox'
    >>> text[start+len(key):]
    ' brown fox'
    

    A few of the finer points:

    • .index() is a method, not a property, so you need to give it the key value you're after.

    • `.index() give you the starting offset of the key within the string, you need to add the length of the key to locate the suffix.

    • "Slicing" a string to get a suffix is accomplished with an open-ended range (e.g. s[n:]). Search on python string slice to find more on how that works.

    • You may need to account for spaces between words. Using the .lstrip() method is probably best for that since it works for no spaces, one space, or multiple spaces.

      >>> text[start+len(key):].lstrip()
      'brown fox'