Search code examples
c#linqmorelinq

Retrieve strings from a file, filtering with Linq when multiple lines contain the exact same string


I am using Visual Studio, with the NuGet package MoreLinq for my following solution.

Example contents of a file that I wish to retrieve, the file also contains other irrelevant data:

...
#define HELLO
#include "hello.h"

code

#define BYE
#include "hello.h"
...

My attempt at the solution, which does almost exactly what I want. But only almost, and I can see why, that's logical:

var files = from file in Directory.EnumerateFiles(path, ".", SearchOption.AllDirectories).Where(s => s.EndsWith(".c") || s.EndsWith(".h"))
            from line in File.ReadLines(file)
            .SkipWhile(l => l.TrimStart() != ("#define HELLO"))
            .TakeUntil(l => l.TrimStart() == ("#define BYE"))
            .ToList()
            select new
            {
                File = file,
                Line = line
            };

foreach (var f in files)
{
    sotredLines.Add(f.Line.Trim());
}

At this point my solution would give me the following results:

#define HELLO
#include "hello.h"

code

#define BYE

If you didn't notice, it is missing the last line that I also wanted to retrieve -> #include "hello.h". My attempt at solving this problem was to add the following line to the code

...
.SkipWhile(l => l.TrimStart() != ("#define HELLO"))
.TakeUntil(l => l.TrimStart() == ("#define BYE"))
.TakeUntil(l => l.TrimStart() == ("#include \"hello.h\""))
...

But this (as expected) returned only the following results:

#define HELLO
#include "hello.h"

Completely ignoring the rest of the wanted information. Because #include "hello.h" appears multiple times, and it stops at the first one that was found.

I want to retrieve only these lines from the mentioned file, without missing one of the lines:

#define HELLO
#include "hello.h"

code

#define BYE
#include "hello.h"

For the solution, whilst still using Linq, see @Freggar's answer below.


Solution

  • You could set a flag in TakeUntil that indicates that you are past #define BYE:

    bool byeFlag = false;
    var p = from line in File.ReadLines(file)
            .SkipWhile(l => l.TrimStart() != ("#define HELLO"))
            .TakeUntil(l =>
            {
                bool ret = byeFlag;
                if (l.TrimStart() == "#define BYE")
                {
                    byeFlag = true;
                }
                return ret;
            })
            .ToList()
            select new
            {
                File = file,
                Line = line
            };
    

    But as already mentioned, maybe LINQ is not really the best tool for what you are trying to do. Maybe parsers like ANTLR are better suited for the job?