Search code examples
.netstreamresetstreamreaderbyte-order-mark

Return StreamReader to Beginning when his BaseStream has BOM


I'm looking for an infallible way to reset an StreamReader to beggining, particularly when his underlying BaseStream starts with BOM, but must also work when no BOM is present. Creating a new StreamReader which reads from the beginning of the stream is also acceptable.

The original StreamReader can be created with any encoding and with detectEncodingFromByteOrderMarks set either to true or false. Also, a read can have been done or not prior calling reset.

The Stream can be random text, and files starting with bytes 0xef,0xbb,0xbf can be files with a BOM or files starting with a valid sequence of characters (for example  if ISO-8859-1 encoding is used), depending on the parameters used when the StreamReader was created.

I've seen other solutions, but they don't work properly when the BaseStream starts with BOM. The StreamReader remembers that it has already detected the BOM, and the first character that is returned when a read is performed is the special BOM character.

Also I can create a new StreamReader, but I can't know if the original StreamReader was created with detectEncodingFromByteOrderMarks set to true or set to false.

This is what I have tried first:

    //fails with TestMethod1
    void ResetStream1(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr.DiscardBufferedData();
    }

    //fails with TestMethod2
    void ResetStream2(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
    }

    //fails with TestMethod3
    void ResetStream3(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }

And those are the thest methods:

    Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});


    [TestMethod]
    public void TestMethod1() {
        StreamReader sr=new StreamReader(StreamWithBOM);
        int before=sr.Read(); //reads X

        ResetStream(ref sr);
        int after=sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod2() {
        StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
        int before = sr.Read(); //reads ï

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod3() {
        StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
        int expected = (int)'X'; //no Read() done before reset

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(expected, after);
    }

Finally, I found a solution (see my own answer) which passes all 3 tests, but I want to see if a more ellegant or fast solution is possible.


Solution

  •     //pass all 3 tests
        void ResetStream(ref StreamReader sr){
            sr.Read(); //ensure that BOM is detected if configured to do so
            sr.BaseStream.Position=0;
            sr=new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
        }