Search code examples
c#linqparsing

Repeat LINQ query without restarting enumeration


I would like to parse a string that has a repeating pattern using LINQ (namely using the Skip and Take families of methods). Here's an example of the string content:

"p0:{foo:bar}\r\np1:1234\r\np2:abcd"

As you can see, it's almost parsable as json. I'm pre-parsing it to prevent later JSON deserialization stages from choking on it.

I've got an idea on a good approach using LINQ, but I can't seem to figure out a way to accomplish this. Here's an example of the desired implementation:

public JToken[] GetContentOfEachP(string text)
{
    return text
        .Repeat(enumerable =>              // <- this is the method I'd like to write
            enumerable                     // enumerable == text @ the enumerator.Current where we left off at the last iteration of Repeat
                .SkipWhile(c => c != ':')  // skip to the good part
                .Skip(1)                   // skip ':'
                .TakeWhile(c => c != '\r') // take the content between ':' & "\r\n"
                .ToArray()                 // 'Select' {foo:bar} as char[] but without disrupting the current enumeration
        )
        .Select(charArray => JToken.Parse(new string(charArray)))
        .ToArray();
}

foreach (var p in GetContentOfEachP("p0:{foo:bar}\r\np1:1234\r\np2:abcd"))
{
    Console.WriteLine(p.ToString());
}

So this would be a method where the enumeration progresses in chunks. It Skips & Takes the contents of p0 and returns that chunk, then continues enumeration, Skiping & Takeing the contents of p1, etc.

I realize I could just foreach (var p in text.Split("\r\n")) and I may actually end up using that in production code for simplicity (in my current implementation, I'm manually IEnumerator<char>.MoveNext()-ing and strategically yield return-ing, so anything would be more readable at this point), but for curiosity's sake, I'd like to see if anybody has an idea of how to accomplish this using the LINQ approach above.


Solution

  • Thanks @Sweeper for your answer; having some feedback and seeing someone else's approach helped me come up with the idea below, which does succeed in implementing my initial vision.

    public static class ParsingExtensions
    {
        public static IEnumerable<char[]> Repeat(this string source, Func<IEnumerable<char>, char[]> scope)
        {
            // wrap the source enumerable so we can control how it gets enumerated.
            // that's important for...
            using var e = new ContinuousEnumerable(source);
    
            // checking if there are any elements in the collection without skipping ahead...
            while (e.CanMoveNext())
            {
                // and running the "repeated" linq without restarting enumeration every time.
                yield return scope(e); 
            }
        }
    
        private class ContinuousEnumerator : IEnumerator<char>
        {
            private readonly IEnumerator<char> _inner;
    
            private int _state;
    
            public char Current => _inner.Current;
            object IEnumerator.Current => Current;
    
            public ContinuousEnumerator(IEnumerator<char> inner)
            {
                _inner = inner;
            }
    
            public bool PeekNext()
            {
                var result = _inner.MoveNext();
    
                if (result)
                {
                   // now that WE'VE checked if there are more elements, 
                   // we need to pretend that we didn't for when the "repeated" linq asks.
                    _state = 1;
                }
    
                return result;
            }
    
            public bool MoveNext()
            {
                switch (_state)
                {
                    // if we've peeked above, we already know the answer.
                    case 1:
                        _state = 0;
                        return true;
    
                    // otherwise, ask the inner enumerator as normal.
                    default:
                        return _inner.MoveNext();
                }
            }
    
            public void Reset() => _inner.Reset();
    
            // don't dispose the enumerator when asked, because we're reusing it.
            void IDisposable.Dispose() { }
    
            // our own dispose method to call when the enumerable is disposed.
            public void Dispose() => _inner.Dispose();
        }
    
        private class ContinuousEnumerable : IEnumerable<char>, IDisposable
        {
            private readonly IEnumerable<char> _inner;
    
            private ContinuousEnumerator? _enumerator;
    
            public ContinuousEnumerable(IEnumerable<char> inner)
            {
                _inner = inner;
            }
    
            public IEnumerator<char> GetEnumerator() => GetEnumeratorImpl();
            IEnumerator IEnumerable.GetEnumerator()  => GetEnumeratorImpl();
    
            // always reuse the enumerator so we don't lose our place in the  enumeration.
            private ContinuousEnumerator GetEnumeratorImpl()
                => _enumerator ??= new ContinuousEnumerator(_inner.GetEnumerator());
    
    
            // use our PeekNext method to check if enumeration will continue.
            public bool CanMoveNext()
                => GetEnumeratorImpl().PeekNext();
    
            // the enumerable is disposable here, 
            public void Dispose() => _enumerator?.Dispose();
        }
    

    There are comments in the code above, but the general idea is to wrap the source enumerable in a custom enumerable (ContinuousEnumerable). That way, you can force enumeration to continue even after the "repeated" linq completes with the ToArray. Because ContinuousEnumerable always returns the first enumerator it created and that enumerator (ContinuousEnumerator) ignores Dispose calls, enumeration on the enumerable will continue where it left off at the end of each repetition.

    Wrapping the source enumerable also allows you to check for remaining elements without the dreaded multiple enumeration. Because we're wrapping the source enumerable's IEnumerator as well, we can use ContinuousEnumerable.CanMoveNext to ask the source enumerator to MoveNext and then pretend like we didn't, so that when the "repeated" linq asks, we won't end up MoveNext-ing twice.

    From there, we just make sure to have an alternate method of disposing the enumerators.