How to get XML from DOC (not DOCX)?

For a DOCX document I do:

document = zipfile.ZipFile(path)
soup = BeautifulSoup(document.read('word/document.xml'), 'html.parser')

How to do this for DOC document?

Solution

You don't.

DOCX are tough enough to process, and they're XML-based and documented by international standards organizations. DOC files are binary and proprietary.

Don't try to process DOC files directly. Convert them to DOCX first.

See:

Convert .doc to .docx using C#
Automation: how to automate transforming .doc to .docx?
multiple .doc to .docx file conversion using python
Python & MS Word: Convert .doc to .docx?

How to get the shape of a xarray dataset by using dims labels
Generating new SQLite database django
Remove background text and noise from an image using image processing with OpenCV
ImportError : No module named graphics
Python TypeError: 'function' object is not subscriptable
python: when can I unpack a generator?
Creating an index in PyMilvus 2.5.x does not actually index any rows
merging xml files using python's ElementTree
Disable python import sorting in VSCode
TemplateDoesNotExist at /users/register/ bootstrap5/uni_form.html
OpenCV Apriltag detection only detects a few markers
How to convert 2D networkx graph to interactive 3D in python?
Custom Service Account with KFP pipelines in Vertex AI
Can I automate discord actions with python?
Anti-Join Pandas
Batch matrix multiplication in numpy
How to align two plots in Matplotlib
Aligning frames in tkinter python, (customtkinter)
Tkinter Listbox How to tell if an item is selected
python filename.py in command line does not work
Text representation of a list with gaps
How to Unit Test a Python Class Which Needs to Make an API Call to an External Service?
convert multi-index column to single column in dataframe
How to find duplicates in a string
Cannot convert base64 string into image
How can I select the proper openai.api_version?
How to extract text associated with image from pdf?
How to import python file from git submodule
Get last row that satisfies a condition using pandas groupby
Python: sharing common code among a family of scripts