Search code examples
pythondocx

iterate through file list of docx to extract and process table


I am facing 3000 docx in several directories and subdirectories. I have to prepare a list which consists of the filename and extracted information from the tables in the docx. I have successfully added all the docx to the list targets_in_dir separating it from non relevant files.

Question : I would like to iterate through targets_in_dir extract all tables from the docx,

len_target =len(targets_in_dir)
file_processed=[]
string_tables=[]

for i in len_target:

    doc = docx.Document(targets_in_dir[i])
    file_processed.append(targets_ind[i])

    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                str.split('MANUFACTURER')
                string_tables.append(cell.text)

I get the error 'int' object is not iterable

 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-4847866a9234> in <module>
      4 string_tables=[]
      5 
----> 6 for i in len_target:
      7 
      8     doc = docx.Document(targets_in_dir[i])

TypeError: 'int' object is not iterable

What am I doing wrong?


Solution

  • It looks like you are trying to iterate through len_target = len(targets_in_dir), which is an int. Because int is not an iterable object, your for-loop fails.
    You need to iterate through an iterable object for the for loop to work.
    fixing it to

    for i in range(len_target):
        # do stuff
    

    or

    for i in targets_in_dir:
        # do stuff
    

    is a good place to start.

    Also, your file_processed.append(targets_ind[i]) has a typo.