Can you please help me i figuring this out. While reading a .docx file from python-docx (docx.Document(file_name)), how can I detect if the docx file is correct or corrupt.
I've got some cases where these input docx files are either empty or corrupt. How can I flag these cases using this library
There is no such feature in python-docx
. Part of the reason is that while a file could be determined to be valid or invalid according to the schema in the ISO specification, many small discrepancies are permitted by each client. What is permitted varies between clients; some things that LibreOffice will accept produce a repair error in Microsoft Word, for example.
The only reliable way to determine this is to attempt to open the file with the target client, perhaps using automation like VBA in the case of Microsoft Word.