Search code examples
pythonstringdocx

Python docx - How to substract the text from word form between strings


I am trying to read a form from word file and substract a specific string as path and create a directory. But I have error message while searching the target string.

import docx
path = 'C:/new/form.docx'
doc = docx.Document(path)
        
table = doc.tables[1]
        
for row in table.rows:
            for cell in row.cells:
                # Extract and process cell text
                cell_text = cell.text.strip()
                
print(cell_text)
    
    def between(value, a, b):
        # Find and validate before-part.
        pos_a = value.find(a)
        if pos_a == -1: return ""
        # Find and validate after part.
        pos_b = value.rfind(b)
        if pos_b == -1: return ""
        # Return middle part.
        adjusted_pos_a = pos_a + len(a)
        if adjusted_pos_a >= pos_b: return ""
        return value[adjusted_pos_a:pos_b]
    
    test = "C:\new"
    list = between(cell_text, "Data Source (Only)", "Working Folders")
    print(between(cell_text, "Data Source (Only)", "Working Folders"))
    
   import os
          
   root_path = 'C:/new'
          
   for items in list:
   path = os.path.join(root_path, items)
   os.mkdir(path)

The contents in cell_text are like below:

.....
.....
Data Source (Only)
T:\vendor
Working Folders
(New created)

What I want to do is to pick up the path 'T:\vendor' and assign to 'list while import os.

Error message:

File "C:\test.py", line 14
    def between(value, a, b):
    ^
IndentationError: unexpected indent

Solution

  • This error occurs because you have extra space before your function. remove them:

    def between(value, a, b):
    

    Your code muse be like this:

    print(cell_text)
    
    def between(value, a, b):
    

    Not this:

    print(cell_text)
    
      def between(value, a, b):