Search code examples
c#.netregeximdb

Parsing Movie title with RegEx


I have 3 strings from wich I want to extract the movie title, if posible in one RegularExpression

<title>Airplane! (1980)</title>    

<title>&#x22;24&#x22; (2001)</title>    

<title>&#x22;Agents of S.H.I.E.L.D.&#x22; The Magical Place (2014)</title>

My best shot so far is this one:

<title>(&#x22;)?(.*?)(&#x22;)?.*?\((\d{4})\).*?</title>

Works fine for "Agents of S.H.I.E.L.D." and "24" but not for "Airplane!".

What am I doing wrong?

Even though it might not be clear the regular expression are called within a C# program, and I'm using RegEx


Solution

  • RE for start-of-line => opening tag => optional " => read until " or (nnnn)

    titles = System.Net.WebUtility.HtmlDecode(titles);
    
    foreach (Match match in Regex.Matches(titles, 
             @"^\s*<title>\s*\""*(.*?)(\""|\(\d{4}\))", RegexOptions.Multiline | RegexOptions.IgnoreCase))
    {
        if (match.Success)
        {
            string name = match.Groups[1].Value;
        }
    }