Search code examples
wpfxpsxpsdocument

WPF find all regex matches in a xps document


I need to search an expression inside a xps document then list all matches (with the page number of each match).

I searched in google, but no reference or sample found which addresses this issue .

SO: How can I search a xps document and get this information?


Solution

  • The first thing to note is that an XPS file is an Open Packaging package. It can be opened and the contents accessed via the System.IO.Packaging.Package class. This makes any operations on the contents much easier.

    Here's an example of how to search the page content with a given regex, while also tracking which page the match occurs on.

    var regex = new Regex(@"th\w+", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline);
    
    using(var xps = System.IO.Packaging.Package.Open(@"C:\path\to\regex.oxps"))
    {
        var pages = xps.GetParts()
            .Where (p => p.ContentType == "application/vnd.ms-package.xps-fixedpage+xml")
            .ToList();
    
        for (var i = 0; i < pages.Count; i++)
        {
            var page = pages[i];
            using(var reader = new StreamReader(page.GetStream()))
            {
                var s = reader.ReadToEnd();
                var matches = regex.Matches(s);
    
                if (matches.Count > 0)
                {
                    var matchText = matches
                        .Cast<Match>()
                        .Aggregate (new StringBuilder(), (agg, m) => agg.AppendFormat("{0} ", m.Value));
                    Console.WriteLine("Found matches on page {0}: {1}", i + 1, matchText);
                }
            }
        }
    }