Search code examples
c#asp.net-coreopenxml-sdk

How to read docx file from a URL using .NET


I want to read the content of the word file using web HTTP request in .NET core 2.2 framework.

I tried the following code:

// Create a new WebClient instance.
using (WebClient myWebClient = new WebClient())
{
    // Download the Web resource and save it into a data buffer.
    byte[] myDataBuffer = myWebClient.DownloadData(body.SourceUrl);

    // Display the downloaded data.
    string download = Encoding.ASCII.GetString(myDataBuffer);
}

Output: enter image description here

Not able to read the content of .docx file from the URL. How can I read docx file without any paid library or using HTTP web request.


Solution

  • You can use OpenXml to process word document : https://learn.microsoft.com/en-us/previous-versions/office/developer/office-2010/cc535598(v=office.14)

    This is probably what you are looking for:

    // Create a new WebClient instance.
    using (WebClient myWebClient = new WebClient())
    {
        // Download the Web resource and save it into a data buffer.
        byte[] bytes = myWebClient.DownloadData(body.SourceUrl);
        MemoryStream memoryStream = new MemoryStream(bytes);
    
        // Open a WordprocessingDocument for read-only access based on a stream.
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(memoryStream, false))
        {
            MainDocumentPart mainPart = wordDocument.MainDocumentPart;
            content = mainPart.Document.Body.InnerText;
        }
    }