Search code examples
pythonpython-3.xpandasdataframeseries

Converting Pandas DataFrame into multiple distinct Pandas Series


I have been asked by my school district to schedule students for non-elective classes next year based on their grade level. Unfortunately, we don't have access to some of the neat scheduling apps that can do this automatically. Rather than rostering every student by hand, I thought I'd try using Python. I've written code before to speed up some low-level admin tasks but this is proving far more challenging than I'd anticipated.

My code needs to do the following:

  • Iterate over a list of students and their grade level
  • Randomly assign each student to one section of one class for each of our 6 periods
  • Ensure that students are not rostered for multiple sections of the same class (e.g., two sections of English 1)
  • Ensure that 2 of the 6 classes are called "Elective"
  • Ensure that no more than 30 students are assigned to a class section

Here's an example of the output I'd like to see:

Class 1 Section 1 Class 1 Section 2
Student 1 Student 3
Student 2 Student 4

Here's another format that my code could produce:

Students 1st Period 2nd Period
Student 1 Course 1 Section 1 Course 3 Section 2
Student 2 Course 2 Section 1 Course 4 Section 1
Student 3 Course 3 Section 1 Course 1 Section 2

I have a CSV of 500 students (names replaced with "Student X" for confidentiality reasons) and their associated grade level. I've converted this CSV into a Pandas DataFrame and written the following code. The direction I went was to make each course (e.g., English 1, Algebra 1, etc.) into a Pandas Series that would be concatenated to a larger DataFrame. For this example, I limited it to just one course type (English 1, aka "ELA 1") as a test:

import pandas as pd

s_test = 'schedule_test.csv' # Roster of students and grade levels

df_s_test = pd.read_csv(s_test)

class_size_limit = 30

stop = 0

df_shuffle = df_s_test.sample(frac=1)

ela1 = {'ELA Section 1':[],'ELA Section 2':[],'ELA Section 3':[],'ELA Section 4':[],'ELA Section 5':[],'ELA Section 6':[]}

sections = ['Section 1','Section 2','Section 3','Section 4','Section 5','Section 6']

for s in df_shuffle.index:
    for sec in sections:
        if df_shuffle.loc[s,'Grade Level']=='9th':
            if stop==0:
                if df_shuffle.loc[s,'Student'] not in ela1[sec]:
                    if len(ela1[sec]) < class_size_limit:
                        ela1[sec]+=[df_shuffle.loc[s,'Student']]
                        stop+=1
    stop = 0

ela1 =pd.Series(ela1)

df_ela1 = pd.DataFrame({})

df_ela1 = pd.concat([df_ela1,ela1.to_frame().T],ignore_index=True)

df_ela1.to_csv('test.csv',index=False)

Unfortunately, the DataFrame I've produced is composed of lists rather than individual strings of students:

ELA Section 1 ELA Section 2
[Student 1, Student 4...] [Student 10, Student 152...]

It also currently does not have a way to prevent a student from being rostered into a section that conflicts with a section they're already in. For example, my code must prevent Student 1 from being put into Algebra 1 Section 1 because it occurs during the same period as ELA Section 1.

This makes me think that I should organize my Pandas Series by the class sections in each period (e.g., all 9th grade 1st period classes, all 9th grade 2nd period classes, etc.) rather than the sections in each class type (e.g., all ELA courses, all Algebra courses, etc.).

I appreciate any help or guidance that you can offer.


Solution

  • I think the easiest way would be to have one data frame that will contain information about all students and all classess that they will attend. Once you will have such data frame you can get any information about attendences of the courses, such as which students attend "Course 1, Section 1" or what courses attend cirtain student. Final data frame will look like this:

    Student Name  Grade   course 1   course 2   course 3
    3     Student 4      9  Section 1  Section 1  Section 1
    18   Student 19      9  Section 1  Section 1  Section 1
    21   Student 22      9  Section 1  Section 1  Section 1
    24   Student 25      9  Section 1  Section 1  Section 1
    37   Student 38      9  Section 1  Section 1  Section 1
    41   Student 42      9  Section 1  Section 1  Section 1
    42   Student 43      9  Section 1  Section 1  Section 1
    94   Student 95      9  Section 1  Section 1  Section 1
    

    From where you can get information which student attends Course 1 Section1:

      Student Name
    3     Student 4
    18   Student 19
    21   Student 22
    24   Student 25
    37   Student 38
    41   Student 42
    42   Student 43
    94   Student 95
    

    Code

    import pandas as pd
    import random
    
    #maximum number of grades
    max_grade = 10
    #create an example dataframe
    students = pd.DataFrame({"Student Name": ["Student " + str(i+1) for i in range (100)], "Grade": [random.randint(1,
                                                                                                                   max_grade) for
                                                                                                     i in range (100)]})
    courses = ["course 1", "course 2", "course 3"]
    sections = ['Section 1','Section 2','Section 3', "Section 4"]
    grades = [i+1 for i in range(max_grade)]
    class_size_limit = 30
    
    #create columns with names of the courses and assign initial value "None"
    for course in courses:
        students[course] = "None"
    #randomly assign students of grade 9 to different courses and different sections
    for course in courses:
        for sec in sections:
            not_assigned_students = students[(students["Grade"] == 9) & (students[course] == "None")]
            if class_size_limit < not_assigned_students.shape[0]:
                sample_indexes = not_assigned_students.sample(n=class_size_limit).index
                students.loc[sample_indexes, course] = sec
            else:
                sample_indexes = not_assigned_students.index
                students.loc[sample_indexes, course] = sec
    
    #get students assigned to the "Course1, Section 1"
    students[students["course 1"]=="Section 1"][["Student Name"]]
    #get number of students assigned to the "Course1, Section 1"
    students[students["course 1"]=="Section 1"]["Student Name"].shape[0]