Search code examples
c#url-parsing

C#: How can I parse the Video URLs from an HTML data?


I have an html data like below:

<h4> Daily Prayer Powered by Hallow App: </h4>\r\n<center><iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/videoseries?list=PLrM5hI29ZNuUmJ6wq15_-qUN8GAWxDs5l\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen=\"\"></iframe></center>\r\n<h3> The featured videos, the Queenship of Mary & Our Lady of Knock, can be found at the bottom of this page. </h3>\r\n\r\n<h4>First Reading - Adapted from 2Thes. 2:1-3A, 14-17 </h4>\r\n<p>Brothers and sisters: </p>\r\n<p>If you hear a statement or a letter supposedly from us that the Lord Jesus Christ is coming now, do not be alarmed.  Do not let anyone trick you.  He has called you through the Gospel to receive the glory of our Lord Jesus.  So be strong and faithful to the traditions which you were taught by us, either orally or in writing.  May our Lord Jesus Christ and God our Father, who loves us and encourages us and gives us hope, encourage you and strengthen your hearts to do every good work. </p>\r\n\r\n<h4>Psalm 96:10-13 </h4>\r\n<p>Say among the nations, \"The LORD reigns! </p>\r\n<p>    Yea, the world is established, it shall never be moved; </p>\r\n<p>    he will judge the peoples with equity.\"</p>\r\n<p>Let the heavens be glad, and let the earth rejoice; </p>\r\n<p>    let the sea roar, and all that fills it; </p>\r\n<p>    let the field exult, and everything in it! </p>\r\n<p>Then shall all the trees of the wood sing for joy </p>\r\n<p>    before the LORD, for he comes, </p>\r\n<p>    for he comes to judge the earth. </p>\r\n<p>He will judge the world with righteousness, </p>\r\n<p>    and the peoples with his truth. </p>\r\n\r\n<h4>Gospel - Adapted from Matthew 23:23-26 </h4>\r\n<p>Jesus said: </p>\r\n<p>\"Great sadness to you, scribes and Pharisees, you hypocrites!  You give valuable spices to the temple, but you do not give more important things - justice, and mercy, and faithfulness.  You should have done these as well as the others.  You are blind guides who focus on religious details but ignore large sins.  You appear clean on the outside, but inside you are full of selfishness.  Blind Pharisee, first be clean on the inside, and the outside will also be clean.\"</p>\r\n\r\n<h4>Featured Videos:</h4>\r\n<h3> The Queenship of Mary </h3>\r\n<p> Watch the Catholic Brain video the Queenship of Mary below: </p>\r\n<div class=\"video-responsive\"><iframe src=\"https://player.vimeo.com/video/447623879\" width=\"640\" height=\"360\" frameborder=\"0\" webkitallowfullscreen=\"\" mozallowfullscreen=\"\" allowfullscreen=\"\"></iframe></div>\r\n<p>The Catholic Brain lesson the Queenship of Mary can be <a href=\"https://www.catholicbrain.com/edu-lesson/1053474/video/the-queenship-of-mary\" target=\"_blank\">found here</a>.</p>\r\n<h3> Our Lady of Knock </h3>\r\n<p> Watch the Catholic Brain video Our Lady of Knock below: </p>\r\n<div class=\"video-responsive\"><iframe src=\"https://player.vimeo.com/video/447228009\" width=\"640\" height=\"360\" frameborder=\"0\" webkitallowfullscreen=\"\" mozallowfullscreen=\"\" allowfullscreen=\"\"></iframe></div>\r\n<p>The Catholic Brain lesson Our Lady of Knock can be <a href=\"https://www.catholicbrain.com/edu-lesson/1051554/video/our-lady-of-knock\" target=\"_blank\">found here</a>.</p>

On this data there are 2 vimeo video URLs like below:

  1. https://player.vimeo.com/video/447623879
  2. https://player.vimeo.com/video/447228009

I need to parse the video URLs alone to a string array. There may be one or more video URLs, I need all the video URLs on an array. How can I parse this data from the HTML data using a c# code?

Update:

I tried the below code and got the initial iframe src link. But there are a few other src links on my data. How can I fetch all iframe src links?

string src = Regex.Match(description, "<iframe.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
Debug.WriteLine("src:>>" + src);

Solution

  • I have parsed the vide link using below code:

    List<string> vimeoVideoList = new List<string>();
    var allMatches = Regex.Matches(description, "<iframe.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase);
    foreach (Match match in allMatches)
    {
        string sourceData = Regex.Match(match.ToString(), "<iframe.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
        vimeoVideoList.Add(sourceData);
    }