Search code examples
c#wpfglyphglyphrun

Displaying large text files with WPF C#


I'm trying to write a WPF application to display (possibly) large log files (50MB-2GB) such that they are easier to read. I tried loading a 5 MB file with ~75k lines into a GridView with TextBlocks but it was really slow. I don't need any editing capabilities.

I came across GlyphRun but I couldn't figure out how to use them. I imagine I would have to fill a canvas or image with a GlyphRun of each line of my log file. Could anyone tell me how to do this? The documentation on GlyphRun is not very helpful unfortunately.


Solution

  • I have this file reading algorithm from a proof of concept application (which was also a log file viewer/diff viewer). The implementation requires C# 8.0 (.NET Core 3.x or .NET 5). I removed some indexing, cancellation etc. to remove noise and to show the core business of the algorithm.
    It performs quite fast and compares very well with editors like Visual Code. It can't get much faster. To keep the UI responsive I highly recommend to use UI virtualization. If you implement UI virtualization, then the bottleneck will be the file reading operation. You can tweak the algorithm's performance by using different partition sizes (you can implement some smart partitioning to calculate them dynamically).
    The key parts of the algorithm are

    • asynchronous implementation of Producer-Consumer pattern using Channel
    • partitioning of the source file into blocks of n bytes
    • parallel processing of file partitions (concurrent file reading)
    • merging the result document blocks and overlapping lines

    DocumentBlock.cs
    The result struct that holds the lines of a processed file partition.

    public readonly struct DocumentBlock
    {
      public DocumentBlock(long rank, IList<string> content, bool hasOverflow)
      {
        this.Rank = rank;
        this.Content = content;
        this.HasOverflow = hasOverflow;
      }
    
      public long Rank { get; }
      public IList<string> Content { get; }
      public bool HasOverflow { get; }
    }
    

    ViewModel.cs
    The entry point is the public ViewModel.ReadFileAsync member.

    Please note, the actual file handling algorithm belongs to the Model. I have only implemented it in the View Model for the simplicity of the example.

    class ViewModel : INotifyPropertyChanged
    {
      public ViewModel() => this.DocumentBlocks = new ConcurrentBag<DocumentBlock>();
    
      // TODO::Make reentrant 
      // (for example cancel running operations and 
      // lock/synchronize the method using a SemaphoreSlim)
      public async Task ReadFileAsync(string filePath)
      {
        using var cancellationTokenSource = new CancellationTokenSource();
    
        this.DocumentBlocks.Clear();    
        this.EndOfFileReached = false;
    
        // Create the channel (Producer-Consumer implementation)
        BoundedChannelOptions channeloptions = new BoundedChannelOptions(Environment.ProcessorCount)
        {
          FullMode = BoundedChannelFullMode.Wait,
          AllowSynchronousContinuations = false,
          SingleWriter = true
        };
    
        var channel = Channel.CreateBounded<(long PartitionLowerBound, long PartitionUpperBound)>(channeloptions);
    
        // Create consumer threads
        var tasks = new List<Task>();
        for (int threadIndex = 0; threadIndex < Environment.ProcessorCount; threadIndex++)
        {
          Task task = Task.Run(async () => await ConsumeFilePartitionsAsync(channel.Reader, filePath, cancellationTokenSource));
          tasks.Add(task);
        }
    
        // Produce document byte blocks
        await ProduceFilePartitionsAsync(channel.Writer, cancellationTokenSource.Token);    
        await Task.WhenAll(tasks);    
        CreateFileContent();
        this.DocumentBlocks.Clear();
      }
    
      private void CreateFileContent()
      {
        var document = new List<string>();
        string overflowingLineContent = string.Empty;
        bool isOverflowMergePending = false;
    
        var orderedDocumentBlocks = this.DocumentBlocks.OrderBy(documentBlock => documentBlock.Rank);
        foreach (var documentBlock in orderedDocumentBlocks)
        {
          if (isOverflowMergePending)
          {
            documentBlock.Content[0] += overflowingLineContent;
            isOverflowMergePending = false;
          }
    
          if (documentBlock.HasOverflow)
          {
            overflowingLineContent = documentBlock.Content.Last();
            documentBlock.Content.RemoveAt(documentBlock.Content.Count - 1);
            isOverflowMergePending = true;
          }
    
          document.AddRange(documentBlock.Content);
        }
    
        this.FileContent = new ObservableCollection<string>(document);
      }
    
      private async Task ProduceFilePartitionsAsync(
        ChannelWriter<(long PartitionLowerBound, long PartitionUpperBound)> channelWriter, 
        CancellationToken cancellationToken)
      {
        var iterationCount = 0;
        while (!this.EndOfFileReached)
        {
          try
          {
            var partition = (iterationCount++ * ViewModel.PartitionSizeInBytes,
              iterationCount * ViewModel.PartitionSizeInBytes);
            await channelWriter.WriteAsync(partition, cancellationToken);
          }
          catch (OperationCanceledException)
          {}
        }
        channelWriter.Complete();
      }
    
      private async Task ConsumeFilePartitionsAsync(
        ChannelReader<(long PartitionLowerBound, long PartitionUpperBound)> channelReader, 
        string filePath, 
        CancellationTokenSource waitingChannelWritertCancellationTokenSource)
      {
        await using var file = File.OpenRead(filePath);
        using var reader = new StreamReader(file);
    
        await foreach ((long PartitionLowerBound, long PartitionUpperBound) filePartitionInfo
          in channelReader.ReadAllAsync())
        {
          if (filePartitionInfo.PartitionLowerBound >= file.Length)
          {
            this.EndOfFileReached = true;
            waitingChannelWritertCancellationTokenSource.Cancel();
            return;
          }
    
          var documentBlockLines = new List<string>();
          file.Seek(filePartitionInfo.PartitionLowerBound, SeekOrigin.Begin);
          var filePartition = new byte[filePartitionInfo.PartitionUpperBound - partition.PartitionLowerBound];
          await file.ReadAsync(filePartition, 0, filePartition.Length);
    
          // Extract lines
          bool isLastLineComplete = ExtractLinesFromFilePartition(documentBlockLines, filePartition); 
    
          bool documentBlockHasOverflow = !isLastLineComplete && file.Position != file.Length;
          var documentBlock = new DocumentBlock(partition.PartitionLowerBound, documentBlockLines, documentBlockHasOverflow);
          this.DocumentBlocks.Add(documentBlock);
        }
      }  
    
      private bool ExtractLinesFromFilePartition(byte[] filePartition, List<string> resultDocumentBlockLines)
      {
        bool isLineFound = false;
        for (int bufferIndex = 0; bufferIndex < filePartition.Length; bufferIndex++)
        {
          isLineFound = false;
          int lineBeginIndex = bufferIndex;
          while (bufferIndex < filePartition.Length
            && !(isLineFound = ((char)filePartition[bufferIndex]).Equals('\n')))
          {
            bufferIndex++;
          }
    
          int lineByteCount = bufferIndex - lineBeginIndex;
          if (lineByteCount.Equals(0))
          {
            documentBlockLines.Add(string.Empty);
          }
          else
          {
            var lineBytes = new byte[lineByteCount];
            Array.Copy(filePartition, lineBeginIndex, lineBytes, 0, lineBytes.Length);
            string lineContent = Encoding.UTF8.GetString(lineBytes).Trim('\r');
            resultDocumentBlockLines.Add(lineContent);
          }
        }      
    
        return isLineFound;
      }
    
      protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = "") 
        => this.PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
    
      public event PropertyChangedEventHandler PropertyChanged;
      private const long PartitionSizeInBytes = 100000;
      private bool EndOfFileReached { get; set; }
      private ConcurrentBag<DocumentBlock> DocumentBlocks { get; }
    
      private ObservableCollection<string> fileContent;
      public ObservableCollection<string> FileContent
      {
        get => this.fileContent;
        set
        {
          this.fileContent = value;
          OnPropertyChanged();
        }
      }
    }
    

    To implement a very simple UI virtualization, this example uses a plain ListBox, where all mouse effects are removed from the ListBoxItem elements in order to get rid of the ListBox look and feel (a indetermintae progress indicator is highly recommended). You can enhance the example to allow multi-line text selection (e.g., to allow to copy text to the clipboard).

    MainWindow.xaml

    <Window>
      <Window.DataContext>
        <ViewModel />
      </Window.DataContext>
    
      <ListBox ScrollViewer.VerticalScrollBarVisibility="Visible" 
               ItemsSource="{Binding FileContent}" 
               Height="400" >
        <ListBox.ItemContainerStyle>
          <Style TargetType="ListBoxItem">
            <Setter Property="Template">
              <Setter.Value>
                <ControlTemplate TargetType="ListBoxItem">
                  <ContentPresenter />
                </ControlTemplate>
              </Setter.Value>
            </Setter>
          </Style>
        </ListBox.ItemContainerStyle>
      </ListBox>
    </Window>
    

    If you are more advanced, you can simply implement your own powerful document viewer e.g., by extending the VirtualizingPanel and using low-level text rendering. This allows you to increase performance in case you are interested in text search and highlighting (in this context stay far away from RichTextBox (or FlowDocument) as it is too slow).

    At least you have a good performing text file reading algorithm you can use to generate the data source for your UI implementation.

    If this viewer is not your main product, but a simple development tool to aid you in processing log files, I don't recommend to implement your own log file viewer. There are plenty of free and paid applications out there.