Search code examples
c#stringc#-4.0string-parsing

how to split the string between two strings in c#?


I have one String variable that contains HTML data.Now i want to split that html string into multiple string and then finally merge those strings into single one.

This is html string:

<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span></p>
<p style="text-align: center;"><strong><span style="color: #008000;">para2</span> स्द्स्द्सद्स्द para2 again<br /></strong></p>
<p style="text-align: left;"><strong><span style="color: #0000ff;">para3</span><br /></strong></p>

And this is my expected output:

<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span><strong><span style="color: #008000;">para2</span>para2 again<br /></strong><strong><span style="color: #0000ff;">para3</span><br /></strong></p>

My Split Logic is given below...

  1. Split the HTML string into token based on </p> tag.
  2. And take the first token and store it in separate string variable(firstPara).
  3. Now take the each and every token and then remove any tag starting with<p and also ending with </p>.And store each value in separate variable.

4.Then take first token named firstPara and replace the tag </p> and then append each every token that we got through the step 3.

5.So,Now the variable firstPara has whole value...

  1. Finally, we just append </p> at the end of the firstPara...

This is my problem...

Could you please step me to get out of this issue...


Solution

  • Here is regex example how to do it.

    String pattern = @"(?<=<p.*>).*(?=</p>)";
    var matches = Regex.Matches(text, pattern);
    StringBuilder result = new StringBuilder();
    result.Append("<p>");
    foreach (Match match in matches)
    {
        result.Append(match.Value);
    }
    result.Append("</p>");
    

    And this is how you should do it with Html Agility Pack

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(text);
    var nodes = doc.DocumentNode.SelectNodes("//p");
    StringBuilder result = new StringBuilder();
    result.Append("<p>");
    foreach (HtmlNode node in nodes)
    {
        result.Append(node.InnerHtml);
    }
    result.Append("</p>");