Search code examples
htmlconvertersmhtml

How can you programmatically (or with a tool) convert .MHT mhtml files to regular HTML and CSS files?


Many tools have a way to export a .MHT file. I want a way to convert that single file to a collection of files, an HTML file, the relevant images, and CSS files, that I could then upload to a webhost and be consumable by all browsers. Does anybody know any tools or libraries or algorithms to do this.


Solution

  • Well, you can open the .MHT file in IE and the Save it as a a web page. I tested this with this page, and even though it looked odd in IE (it's IE after all), it saved and then opened fine in Chrome (as in, it looked like it should).

    Barring that method, looking at the file itself, text blocks are saved in the file as-is, and all other content is saved in Base64. Each item of content is preceded by:

    [Boundary]
    Content-Type: [Mime Type]
    Content-Transfer-Encoding: [Encoding Type]
    Content-Location: [Full path of content]
    

    Where [Mime Type], [Encoding Type], and [Full path of content] are variable. [Encoding Type] appears to be either base64 or quoted-printable. [Boundary] is defined in the beginning of the .MHT file like so:

    From: <Saved by WebKit>
    Subject: converter - How can you programmatically (or with a tool) convert .MHT mhtml        files to regular HTML and CSS files? - Stack Overflow
    Date: Fri, 9 May 2013 13:53:36 -0400
    MIME-Version: 1.0
    Content-Type: multipart/related;
        type="text/html";
        boundary="----=_NextPart_000_0C08_58653ABB.B67612B7"
    

    Using that, you could make your own file parser if needed.