Update:
I can confirm that the behaviors noted below were down to me doing something that I had not specified before which is that I was playing manually with the reader charPos property and therefore the question could be renamed: "How to screw up your working fine Read(buffer,int,int) method" and the answer is to simply manually set the position of the reader (SR1) position outside the stream (FSr) buffersize (not to be confused with the read operation buffer):
before the loop (in the codes in the original question)
System.Reflection.FieldInfo charPos_private = typeof(StreamReader).GetField("charPos", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.DeclaredOnly);
and within the loop (in the codes in the original question)
charPos_private.SetValue(SR1, string_index);
The file reader actually reads to 1024 and then it goes to 0 when the File Stream reads the next 1024 chars. I was attempting to set the position manually (as I'm messing up with some patterns) and i had not noticed that it can't ever go to 1025.
And then, that's how you screw up with simple stuff. Thanks a lot to all that commented! Much appreciated! I'll set the answer to the one that contains the example on how to do it correctly, the codes I was posting also work fine had it not been for those couple of lines that I had not mentioned.
Original question
First time around here,
I'm self learning C#. I'm trying to use streamreader to read from a big UTF-8 Linux LF (ended on \n) (an xml) char by char (or block by block) and I'm performing some operations on it and then later writing it into a new file char by char (or block by block). I have a streamreader and streamwriter.
I will explain in words and add some code at the end:
I'm finding the streamreader Read() and Read(char[] buffer, int index, int count) methods to perform differently on big files. I know those two are nothing but two different ways of calling the same method (I have also tried ReadBlock) but the situation is: Read () method automatically fills the StreamReader object ByteBuffer (array) dynamically, that is when the StreamReader object Position reaches the default bufferSize parameter (which is usually 1024 or 4096) then the method automatically begins buffering the next 1024 or 4096 or whatever the buffersize is.
But Read(char[] buffer, int index, int count) doesn't do that automatically therefore it throws an exception when the StreamReader object Position reaches buffersize +1. i.e at 1025 position or 4097 position (char) (System.IndexOutofRangeException on System.Buffer.InternalBlockCopy(Array src, Int32 srcOffsetBytes, Array dst, Int32 dstOffsetBytes, Int32 byteCount)) or if I try to Peek() to see what is next (System.IndexOutofRangeException on System.IO.StreamReader.Peek()). My test file is 300 MB.
*The Question is: How do i get Read(char[] buffer, int index, int count) to automatically rebuffer the ByteBuffer (StreamReader: Non-Public members ByteBuffer) so as to effectively read a file bigger than the buffer size ? or in other words: How Do I actually read a big file with Read(buffer_search, 0, x_number_of_chars) ? *
I mean I don't know if I'd need to manually modify the ByteBuffer via System Reflection and How I'd do it. It should be automatic; Re-buffering manually would be like too much work for a simple thing.
In code: (I'm paraphrasing some code here)
doing something like:
char current_char;
using (System.IO.FileStream FSw = new FileStream(sourcePath, FileMode.Create))
{
using (System.IO.StreamWriter SW1 = new StreamWriter(FSw, System.Text.Encoding.UTF8))
{
using (FileStream FSr = new FileStream(destinationPath, FileMode.Open))
{
using (StreamReader ofile_temp_chars = new StreamReader(fsr, System.Text.Encoding.UTF8))
{
while ((current_char = (char)SR1.Read()) != '\uffff')
{
SW1.Write(current_char);
}
}
}
}
}
that code is successful and has no problems. The big file is read in written into a new file.
But When i try to specify the number of chars to read (I'm actually having to read an user defined number of chars, I'm just showing here some code reading just one char to simplify) then I need to use Read(char[] buffer, int index, int count), like this:
char[] buffer_search = new char[1]
using (System.IO.FileStream FSw = new FileStream(fePath, FileMode.Create))
{
using (System.IO.StreamWriter SW1 = new StreamWriter(FSw, System.Text.Encoding.UTF8))
{
using (FileStream FSr = new FileStream(fPath, FileMode.Open))
{
using (StreamReader ofile_temp_chars = new StreamReader(fsr, System.Text.Encoding.UTF8))
{
while (SR1.Peek() != -1)
{
SR1.Read(buffer_search, 0, 1);
SW1.Write(buffer_search[0]);
}
}
}
}
}
That code Will end with an exception ((System.IndexOutofRangeException on System.IO.StreamReader.Peek() ) when the streamreader object Position reaches and passes buffersize (i.e 1025, 4097) etc... It is obviously Peeking from what it has on the buffer not on the file itself and not automatically rebuffering results in peeking outside the ByteBuffer char[].
If I do something like this:
char[] buffer_search = new char[1]
using (System.IO.FileStream FSw = new FileStream(fePath, FileMode.Create))
{
using (System.IO.StreamWriter SW1 = new StreamWriter(FSw, System.Text.Encoding.UTF8))
{
using (FileStream FSr = new FileStream(fPath, FileMode.Open))
{
using (StreamReader SR1 = new StreamReader(fsr, System.Text.Encoding.UTF8))
{
while (!end_of_file)
{
try { SR1.Read(buffer_search, 0, 1); }
catch { end_of_file = true; }
SW1.Write(buffer_search[0]);
}
}
}
}
}
Then I will end up with a file that contains Only 1024 chars or what the buffersize is. and the exception (catched) that will be thrown will be: System.IndexOutOfRangeException on System.Buffer.InternalBlockCopy(Array src, Int32 srcOffsetBytes, Array dst, Int32 dstOffsetBytes, Int32 byteCount) on System.IO.StreamReader.Read(Char[] buffer, Int32 index, Int32 count)
So in both cases the result is the same the buffer is not getting new data from the file something that is handled automatically by Read() and ReadLine() methods.
Simple solutions like increasing the buffersize won't work as my file is on the hundreds of MB and I'm trying to be memory efficient... (or simpler like using Read() instead, as I need to use Read(buffer, 0, x_number_of_chars). This should be a simple thing and is taking longer than expected.
Thanks for your help,
It's really unclear what you're asking. But, if you want to read an arbitrary number of characters from one stream reader and write them to a writer, this works:
int bytesRead;
do
{
bytesRead = SR1.Read(buffer_search, 0, buffer_search.Length);
if (bytesRead > 0)
{
// TODO: process buffer_search in some way.
SW1.Write(buffer_search, 0, bytesRead);
}
} while (bytesRead > 0);
That will read new characters into the internal stream writer buffer when needed.