Search code examples
c#htmlms-wordopenxmlole

Embedding Objects with OpenXML in HTML chunks


I'm trying to accomplish something very specific that I don't even know if its possible.

The situation is:

  • I have an html content that, apart from formatted text, shows links to different files (pdf, docx, etc..) inside my own server.

  • I'd like to export this html content to a docx file using OpenXML, but instead of links to the server, these files have to be downloaded and embedded into the file as objects.

I've currently achieved:

  • Embedding external files as OLE objects, using OpenXML's EmbeddedObjectPart and then referencing it from a paragraph of the document.

  • Inserting html content in the document, using "altchunks".

I've tried to:

  • Reference embedded object binary (inside package) from html link.
  • Use html tags like "embed"

None of these ways has worked out for me. I don't know if is the correct approach nor I'm doing it correctly. What I'dont want to do is, embed these files after or before the html content because they are part of it.

Thanks in advance.


Solution

  • Finally I've reached a solution that fits my purposes. I'll share it with you:

    • The html code to insert is not very complex: it comes from a SharePoint's full-html-enabled rich text field, but the user is only using Sharepoint's OOTB editor for the field, so no css, etc.

    • As consequence, I've decided, instead of inserting the html content as an AlternativeFormatImportPart (altChunk), to parse it before and insert it as pure OpenXml.

    • To perform the conversion, I'm using the html2openxml library as a base. I've extended it overloading the .Parse(...) method as follows:

      1. If an "a href=..." tag is found , we analyze href value to decide if it's a link to our internal server or not.
      2. If href points to our server, we replace the "a href" tag with an special serialized class that contains file's url and icon's image url.
      3. We let the original .Parse method perform the conversion.
      4. We analyze the IList returned by original's .Parse function to find OpenXml's Text Elements whose content is our special serialized class containing links.
      5. We replace each OpenXml's Run Element containing these Text elements by a Run element that references the embedded object with the binary content of each file and a shape containing the icon's image.

    I'm sharing here also a code stub so you can know how I've extended the functionality, if someone is interested on the full solution, please let me know.

            /// <summary>
        /// Replaces anchor hrefs to documents on server with embedded OLE objects 
        /// Start the parse processing
        /// </summary>
        /// <param name="html"></param>
        /// <param name="embeddServerLinksAsObjects"></param>
        /// <returns></returns>
        public IList<OpenXmlCompositeElement> Parse(string html, bool embeddServerLinksAsObjects)
        {
            try
            {
                if (embeddServerLinksAsObjects)
                {
                    html = ReplaceAnchorLinksByOXMLLinks(html, this.serverRoot);                 
                }
    
                IList<OpenXmlCompositeElement> oceList = base.Parse(html);
    
                if (embeddServerLinksAsObjects)
                {
                    oceList = ReplaceOXMLLinksByOLEObjects(oceList, this.mainDocumentPart, this.serverRoot);
                }
    
                return oceList;
    
            }
            catch (Exception ex)
            {
    
            }
    
            return null;
        }