Search code examples
pythonpandasdataframenumpydata-analysis

Get total no of classes of each subject within a semester using pandas


Time table, columns=hour, rows=weekday, data=subject

[weekday x hour]

                               1                      2                 3             4                 5                      6                      7
Name                                                                                                                                                   
Monday                   Project                Project           Project  Data Science  Embedded Systems            Data Mining  Industrial Psychology
Tuesday                  Project                Project           Project       Project      Data Science  Industrial Psychology       Embedded Systems
Wednesday           Data Science                Project           Project       Project           Project                Project                Project
Thursday             Data Mining  Industrial Psychology  Embedded Systems   Data Mining           Project                Project                Project
Friday     Industrial Psychology       Embedded Systems      Data Science   Data Mining           Project                Project                Project

Frequency table rows=weekday, columns=subject, data = subject frequency in the corresponding weekday

[weekday x subject]

Data       Data Mining  Data Science  Embedded Systems  Industrial Psychology  Project
Name                                                                                  
Friday               1             1                 1                      1        3
Monday               1             1                 1                      1        3
Thursday             2             0                 1                      1        3
Tuesday              0             1                 1                      1        4
Wednesday            0             1                 0                      0        6                            

Code

self.start = datetime(2022, 1, 1)
self.end = datetime(2022, 3, 31)

self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)

self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
            .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
print(self.p.to_string())

Required Table

                       classes ...
Data Mining            32        
Data Science           32
Embedded Systems       32
Industrial Psychology  32
Project                146     

Will be adding more columns later, like current attendance percentage, percentage drop for each class missed, percent losses for taking leaves on Monday, Tuesday, ... etc so as to subtract them from attendance percentage.

The end goal is to analyse which day is safe to take a leave, and to monitor my percentage. If my direction could be better, please advise me.


Solution

  • One possible approach is to use bdate_range like you did and use weekday to select the weekdays (0-4) and map these numbers to their corresponding weekday names; then reindex the Frequency table with it. Then you get a DataFrame where each row corresponds to a weekday between 2022-1-1 and 2022-3-31. Then sum finds the total for each class:

    out = (freqtable.reindex(pd.bdate_range('2022-1-1','2022-3-31').weekday
                             .map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday']))))
           .sum()
           .rename_axis(['classes']).reset_index(name='count'))
    

    Output:

                     classes  count
    0            Data Mining     51
    1           Data Science     51
    2       Embedded Systems     51
    3  Industrial Psychology     51
    4                Project    244