Search code examples
linqc#-4.0filestream

Read rows from text file, take more lines if value matches


I have a text file that has several thousand rows. Each row is a student and test scores for various tests. Not all students have the same number of tests (ie rows). I need to break the file up into smaller chunks, but I don't want to break up any group of student scores. No sorting is necessary as the main file is already sorted, but we sort it for good measure.

Lets say I want the chunk to have at least 5 rows, but if the 6th row is the same student as the 5th row, then add the 6th row to the chunk. And so on, until the student changes.

Then, start a new chunk (with the headers, but that part is easy) until the end of the original file is reached.

I'm fine with either linq or FileStream, once I have each chunk I will be loading it into an app via an API.

Here is a simplified sample of the main file:

STUDENT_ID    TEST    SCORE
000001          A       10
000001          B       10
000001          C       10
000001          D       10
000002          A       10
000002          B       10
000002          C       10
000002          D       10
000003          A       10
000003          B       10
000004          C       10
000004          D       10
000004          E       10
000004          F       10

So, the first chunk would look like:

STUDENT_ID    TEST    SCORE
000001          A       10
000001          B       10
000001          C       10
000001          D       10
000002          A       10
000002          B       10
000002          C       10
000002          D       10

So far I've done one While loop that uses a constant "rowsToTake" = 5, a substring(0, 6) that compares the STUDENT_ID of the 5th row taken, and a "currentPosition" that increments on each take. I lost momentum on the outer loop that gets the subsequent chunks. I chose not to post my code so far, because I don't think it's good, and I don't want anyone to feel they should build on it.


Solution

  • I do not think a LINQ solution will be appropriate for your scenario. I would prefer using a for-loop, comparing the contents of each line in your text file accordingly.


    Psuedocode:

    string previousStudentID = null;
    List<...> chunk = new List<...>();
    foreach (string line in file)
    {
        string studentID = // parse studentID from line
    
        if (studentID != previousStudentID && chunk.Count > 5)
        {
            // add header to beginning of chunk
            // load chunk to API
    
            chunk.Clear(); // clear/create a new chunk
        }
    
        // add line to chunk
    
        previousStudentID = studentID;
    }
    
    // load remaining header/chunk to API, if necessary