LogParser isn't open source and I need this functionality for an open source project I'm working on.
I'd like to write a library that allows me to query huge (mostly IIS) log files, preferably with Linq.
Do you have any links that could help me? How does a program like LogParser work so fast? How does it handle memory limitations?
It probably process the information in the log as it reads it. This means it (the library) doesn't have to allocate a huge amount of memory to store the information. It can read a chunk, process it and throw it away. It is a usual and very effective way to process data.
You could for example work line by line and parse each line. For the actual parsing you can write a state machine or if the requirements allows it, use regex.
Another approach would be a state machine that both reads and parses the data. If for some reason a log entry spans more than one line this might be needed.
Some state machine related links:
A very simple state machine written in C: http://snippets.dzone.com/posts/show/3793
Alot of python related code, but some sections are universally applicable: http://www.ibm.com/developerworks/library/l-python-state.html