Search code examples
htmlcssms-wordgenerated-code

Styling Microsoft-Word-Generated HTML


Ugh. Word is notorious for its bloated, convoluted, non-standards-compliant, non-semantic HTML. Unfortunately, I have a professor who is requiring us to generate an outline to very exacting standards. I'd rather not hand-write it, so I decided to make something that would be useful for my classmates as well. I created the outline using a simple numbered list in NeoOffice on my Mac, exported it as HTML, and wrote quite a bit of CSS to style it. Then, I got someone to create an ordered list in Word for Windows, export it as html, and send it to me to check compatibility. After scrolling miles down the page, trying to repress a shudder, I saw a problem. Word did not use <ol> and <li>. It used mountains of nested <span>s with classes out the wazoo. I hate to see all my work go to waste, but this content is impossible to work with—I'd have to style on a document-to-document basis, rather than with a universal stylesheet.

Ideally, Word would generate HTML using standard tags so that I could style it just like any other list, but this doesn't seem to be the case. How can I make it generate lists that actually use <ul> and <li> rather than <span>, or at least modify something in my code to somehow work with the way weird way it does create lists?


Solution

  • From doing some research, it appears that the approach of converting the document to HTML isn't practical. Word is simply too variable in its approach to file saving and HTML generation for a single document, not to mention differences among different versions of Word. Similar to Wyatt's suggestion, there may be ways to clean up the code, but none of them are perfect. Digging around the API may provide a way to parse this more easily, but it may turn out that this is in practice just as convoluted. It seems that using word as a list-generation tool simply is unrealistic.