Search code examples
c#htmlasp.net-corerestsharp

How to get the body content of a web page returned in an API in ASP.NET Core


The response from an API is web page with full HTML and CSS content. The only thing I want is the content in the body.

How can I extract the body content from the web page?

Below is the short version of the web page. The page is very long I can't post everything here.

The body content I want to extract is " Hi John, Doe wishes you a happy anniversary and wants all of us at FCMB to wish you same, Congratulations on your anniversary Doe"

<!DOCTYPE html>
<html>
<head>
    <style>
        body {padding: 0; margin: 0; font-family: sans-serif;}
        .general-container {min-height: 100vh; border-radius: 6px; }
    </style>
</head>
<body>
    <div class="modal fade" id="CustomerPreviewMsg" tabindex="-1" role="dialog" aria-labelledby="exampleModalCenterTitle" aria-hidden="true">
        <div class="modal-dialog modal-dialog-centered" role="document">
            <div class="modal-content">
                <div class="modal-header">
                    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
                        <span aria-hidden="true">&times;</span>
                    </button>
                </div>
                <div class="modal-content">
                    <div class="modal-body mb-0 p-0">
                        <div class="row mx-0 col-12 profile-pic-container">
                            <p class="pt-3">
                                Hi John, Doe wishes you a happy anniversary and wants all of us at FCMB to wish you same, Congratulations on your anniversary Doe
                            </p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>
</html>
<script src="/Scripts/jquery-3.3.1.js"></script>
<script src="/JsFile/MainJs.js"></script>
<script src="https://unpkg.com/wavesurfer.js"></script>

This is the code that is consuming the endpoint

var client = new RestClient(appSettings.ShoutOutPreviewUrl + previewMessage.MessageHistoryId);
client.AddDefaultHeader("Authorization", string.Format("Bearer {0}", appSettings.ShoutOutToken));
client.Timeout = -1;
var request = new RestRequest(Method.GET);
request.AddHeader("Content-Type", "text/plain");

IRestResponse response = await client.ExecuteAsync(request);
IRestResponse<string> res = client.Execute<string>(request);

return res.Content;

Solution

  • After some digging, I used HtmlAgilityPack to get the node https://html-agility-pack.net/ I installed via nuget

    internal string ParseHtml(string Html)
            {
                var doc = new HtmlDocument();
                doc.LoadHtml(Html);
    
                var htmlNodes = doc.DocumentNode.SelectSingleNode("//p[@class='pt-3']");
    
                string rawText = htmlNodes.InnerText.Trim();
    
                return rawText;
            }