I have a text file that has several thousand rows. Each row is a student and test scores for various tests. Not all students have the same number of tests (ie rows). I need to break the file up into smaller chunks, but I don't want to break up any group of student scores. No sorting is necessary as the main file is already sorted, but we sort it for good measure.
Lets say I want the chunk to have at least 5 rows, but if the 6th row is the same student as the 5th row, then add the 6th row to the chunk. And so on, until the student changes.
Then, start a new chunk (with the headers, but that part is easy) until the end of the original file is reached.
I'm fine with either linq or FileStream, once I have each chunk I will be loading it into an app via an API.
Here is a simplified sample of the main file:
STUDENT_ID TEST SCORE
000001 A 10
000001 B 10
000001 C 10
000001 D 10
000002 A 10
000002 B 10
000002 C 10
000002 D 10
000003 A 10
000003 B 10
000004 C 10
000004 D 10
000004 E 10
000004 F 10
So, the first chunk would look like:
STUDENT_ID TEST SCORE
000001 A 10
000001 B 10
000001 C 10
000001 D 10
000002 A 10
000002 B 10
000002 C 10
000002 D 10
So far I've done one While loop that uses a constant "rowsToTake" = 5, a substring(0, 6) that compares the STUDENT_ID of the 5th row taken, and a "currentPosition" that increments on each take. I lost momentum on the outer loop that gets the subsequent chunks. I chose not to post my code so far, because I don't think it's good, and I don't want anyone to feel they should build on it.
I do not think a LINQ solution will be appropriate for your scenario. I would prefer using a for-loop, comparing the contents of each line in your text file accordingly.
Psuedocode:
string previousStudentID = null;
List<...> chunk = new List<...>();
foreach (string line in file)
{
string studentID = // parse studentID from line
if (studentID != previousStudentID && chunk.Count > 5)
{
// add header to beginning of chunk
// load chunk to API
chunk.Clear(); // clear/create a new chunk
}
// add line to chunk
previousStudentID = studentID;
}
// load remaining header/chunk to API, if necessary