Search code examples
pythondataframematrixtransitionmarkov

transition matrix for counts and proportions python


I have a matrix with the grades from a class for different years(rows for years and columns for grades). What I want is to build a transition matrix with the change between years.

For instance, I want year t-1 on the y-axis and year t on the x-axis and then I want a transition matrix with the difference in the number of people with grade A between year t-1 and t, grade B between year t-1 and t, and so on. And then a second transition matrix with the proportions, for example: - Between year t-1 and t there z% more/less people with grade A/B/C/D/F.

Obviously the moest import part is the diagonal which would represent the change for the same grade for different years.

I want this to be done in Python.

Thank you very much, I hope everything is clear.

Result example: enter image description here


Solution

  • You can use pandas library with df.diff. numpy can generate the matrix of all possible differences using np.subtract.outer. below is an example.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    years = ['2015', '2016', '2017']
    grades = ['A', 'B', 'C', 'D']
    
    df = pd.DataFrame(np.random.randint(0, 10, (3, 4)), columns=grades, index=years)
    
    print(df)
    
          A  B  C  D
    2015  5  0  2  0
    2016  7  2  0  2
    2017  3  7  6  7
    
    df_diff = df.diff(axis=0)
    print(df_diff)
    

    each row here in df_diff is the difference between current row and the preceding one from original df

            A        B     C     D
    2015    NaN     NaN   NaN   NaN
    2016    2.0     2.0   -2.0  2.0
    2017    -4.0    5.0   6.0   5.0
    
    a = np.array([])
    differences = []
    for i, y in enumerate(years):
        for j, g in enumerate(grades):
            differences.append(y+g)
            a = np.append(a, df.iloc[i,j])
    
    df3 = pd.DataFrame(np.subtract.outer(a, a), columns=differences, index=differences)
    print(df3)
    
          2015A   2015B  2015C  2015D   2016A   2016B   2016C   2016D   2017A   2017B   2017C   2017D
    2015A   0.0     5.0  3.0    5.0 -2.0    3.0     5.0 3.0      2.0    -2.0    -1.0    -2.0
    2015B   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0    -3.0    -7.0    -6.0    -7.0
    2015C   -3.0    2.0  0.0    2.0 -5.0    0.0     2.0 0.0     -1.0    -5.0    -4.0    -5.0
    2015D   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0    -3.0    -7.0    -6.0    -7.0
    2016A   2.0     7.0 5.0     7.0  0.0    5.0     7.0  5.0    4.0     0.0   1.0       0.0
    2016B   -3.0    2.0 0.0     2.0 -5.0    0.0     2.0 0.0    -1.0    -5.0  -4.0   -5.0
    2016C   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0   -3.0    -7.0  -6.0   -7.0
    2016D   -3.0    2.0 0.0     2.0 -5.0    0.0     2.0 0.0    -1.0     -5.0    -4.0    -5.0
    2017A   -2.0    3.0 1.0     3.0 -4.0    1.0     3.0 1.0     0.0    -4.0  -3.0   -4.0
    2017B   2.0     7.0 5.0     7.0 0.0     5.0     7.0 5.0     4.0     0.0     1.0     0.0
    2017C   1.0     6.0 4.0     6.0 -1.0    4.0     6.0 4.0     3.0    -1.0   0.0     -1.0
    2017D   2.0     7.0 5.0     7.0 0.0     5.0     7.0 5.0     4.0     0.0   1.0 0.0
    

    plot this matrix using matshow from matplotlib

    plt.matshow(df3)
    plt.colorbar()
    plt.show()
    

    enter image description here