Search code examples
c#linuxmonostreamreader

streamreader's basestream position using mono


I am trying to write some simple code to index some wikipedia xml pages. The idea was to get the byte offset of each character by reading in a character using streamreader, then saving the position from the byte stream so I could get back to that position later.

using a short test file that just contains "感\na\nb" (8 bytes) with new line after each character. Then I tried using this code in the main function :

using System;
using System.IO;

namespace indexer

{
    class MainClass
    {
        public static void Main(string[] args)
        {
            StreamReader sr = new StreamReader (@"/home/chris/Documents/len.txt");

            Console.Out.WriteLine(" length of file is " + sr.BaseStream.Length + " bytes ");
            sr.Read (); // read first byte. 
            Console.Out.WriteLine(" current position is  " + sr.BaseStream.Position);

            sr.Close ();
        }
    }
}

this gives the output :

length of file is 8 bytes 
current position is  8

The position should be 3, as it should only read the first character. If I use sr.Read() again, I do get the next character correctly, but the position remains 8.

Am I misunderstanding how this should work, or have I discovered a bug of some sort?

Thank you.


Solution

  • No, it is not a bug. StreamReader uses a 1 KB buffer inside which is filled up when you call StremReader.Read().

    You should call Encoding.GetByteCount() method to get a number of bytes in a character or a string is being read‏. Current encoding can be found in StreamReader.CurrentEncoding.