Search code examples
c#asp.net-coredocxopenxml-sdk

Converting a word document from docs to HTML on ASP.NET Core backend and return to frontend


I have a Vue frontend that sends an API request from the front end to the backend to query MS Graph API to pull a word document from OneDrive. I need this document to be converted to HTML so I can take the contents in HTML and load it into rich text editor input box (takes HTML form). So to convert from docx to HTML in C# seems to be doable using OpenXmlPowerTools NuGet package. I have successfully run this code below to make the conversion but I am not sure of the best way to get this to the front end as it draws a error using Ok():

[HttpPost]
[ValidateAntiForgeryToken]
public async Task<ActionResult> GetEmailHTML([FromForm] string PPID)
{

    String query = "select dbo.scalarfunction(@ID)";

    using (var connection = new SqlConnection(connectionString))
    {
        var FilePath = connection.ExecuteScalar<string>(query, new { PPID = PPID});

        using (var memoryStream = await oneDrive.DriveItemDownloadAsync(SharedDriveID, Path.Combine(fileDirectory, FilePath), "path"))
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
            {
                HtmlConverterSettings settings = new HtmlConverterSettings()
                {
                    PageTitle = "Title"
                };
                XElement html = HtmlConverter.ConvertToHtml(doc, settings);

                return Ok(html);
            }

        }
    }
}

Inside the HT<: object there is a first node and a last node. But if you keep expanding them they are nested. I am not sure how to either parse this or just return the body tag from the word document to the frontend as that's all I really need. I do not need the header and all that stuff.


Solution

  • You don't neccessarilly need to return ActionResult and wrap the html in OK(), unless you are returning other action results like BadRequest().

    I would change the return type to Task<ContentResult>, and remove the OK for now. The error may be related to the front end not being able to parse a complex type like XElement. Can you share the error you are getting?

    As for the other question regarding parsing the XElement, you can filter the XElement to only return the body and its inner HTML. You can do this using LINQ like so:

    XNamespace xhtml = "http://www.w3.org/1999/xhtml";
    var bodyElement = html.Descendants(xhtml + "body").FirstOrDefault();
    string bodyHtml = bodyElement?.ToString();
    

    Then to return as ContentResult-

                return new ContentResult
                {
                    ContentType = "text/html",
                    Content = bodyHtml,
                    StatusCode = 200
                };