Search code examples
pythonlistif-statementzipenumerate

Categorize Data by Day of Week Python


Given a list of lists in Python 2.7:

[['OCT 27, 2015',2,4,6,8],['NOV 03, 2015',1,5,9,2],['NOV 05, 2015',2,4,6,8]]

How do you print out the averages of the list in general; then averages based on the day of the week. EG:

2.5, 6.5, 10.5, 9, 1.5, 4.5, 7.5, 5, 2, 4, 6, 8

Where:

1.5, 4.5, 7.5, 5 are the average values produced above since Tuesday occurs twice in this list

2.5, 6.5, 10.5, 9 are the averages of the list in general

Then 2, 4, 6, 8 is the average of Thursday occurrences

I'm pretty sure I can use the zip() function to get all the like values. I'm just not sure about the averages calculated per day of the week (Monday, Tuesday, Wednesday,...)

How might I organize this into a clean looking tabular format with "Averages" and "Monday, Tuesday,.." as other headers

   Averages     Monday      Tuesday      Wednesday   Thursday   Friday  Saturday Sunday
2.5 6.5 10.5 9    0      1.5 4.5 7.5 5       0       2 4 6 8       0       0       0

Should I use an if statement?


Solution

  • This seems like an excellent job for pandas:

    from __future__ import print_function   
    import pandas as pd
    import numpy as np
    
    raw_data = [['OCT 27, 2015',2,4,6,8],['NOV 03, 2015',1,5,9,2],['NOV 05, 2015',2,4,6,8]]
    pandas_data = pd.DataFrame(raw_data, columns=['Date', 'A', 'B', 'C', 'D'])
    pandas_data["Date"] = pd.to_datetime(pandas_data["Date"])
    print(pandas_data)
    print(pandas_data.mean())
    

    output:

            Date  A  B  C  D
    0 2015-10-27  2  4  6  8
    1 2015-11-03  1  5  9  2
    2 2015-11-05  2  4  6  8
    
    A    1.666667
    B    4.333333
    C    7.000000
    D    6.000000
    

    This gives us our summary statistics for all columns, but for particular days of the week I'd do something like this (following the above code):

    pandas_data["Day of the Week"] = pandas_data["Date"].dt.dayofweek
    grouped_data = pandas_data.groupby('Day of the Week').aggregate(np.mean)
    print(grouped_data)
    

    output:

                       A    B    C  D
    Day of the Week
    1                1.5  4.5  7.5  5
    3                2.0  4.0  6.0  8
    

    pandas' convention for day of the week is Monday=0, Sunday=6

    pandas' groupby function does what it says on the tin: It groups the data based on some criteria, in this case the column 'Day of the Week'. The aggregate function then runs the given functions on those groups and gives you the results.