Search code examples
pythonpandasmatrix-multiplication

How to perform Matrix multiplication with conditions in python?


I am using matrix multiplication on a dataframe and its transpose with [email protected]

So if I have a df which looks like: (below 1 indicates that the object has the property whereas 0 indicates not having it):

Object Property1 Property2 Property3
A      1         1         1
B      0         1         1
C      1         0         0

Using [email protected] gives me:

   A  B  C
A  3  2  1  
B  2  2  0
C  1  0  1

This can be thought of a matrix showing how many properties each object has in common with another.

I now want to modify the problem where, instead of a binary indication of whether an object has a property, the properties column show levels of that property. So the new df looks like: (below the values 1,2,3 of properties shows its level. But 0 indicates not having the property)

Object Property1 Property2 Property3
A      3         2         1
B      0         2         3
C      2         0         0

I want to apply matrix multiplication, but with an altered definition of 'common' properties. Two objects will only have a common property if the levels of a property is within +-1 range of the other property.

Below is what the result will look like:

   A  B  C
A  3  1  1  
B  1  2  0
C  1  0  1

Note that the number of properties common between A and B have changed from 2 to 1. This is because property 3 between A and B is not within +-1 level. Also, 0 still means that the object does not have the property, so A and C still have 1 property in common (with property 3 for C being 0).

How can I achieve this in Python?


Solution

  • This can be done by modifying matrix multiplication for two DataFrames

    Code

    # DataFrame Matrix Multiplication
    # i.e. equivalent to df1@df2
    def df_multiply(df_a, df_b):
      '''
         Matrix multiplication of values in two DataFrames
         Returns a DataFrame whose index and column are
         from the df_a 
      '''
      a = df_a.values
      b = df_b.values
      zip_b = zip(*b)
      zip_b = list(zip_b)
      zip_b = b
      result = [[sum(ele_a*ele_b for ele_a, ele_b in zip(row_a, col_b)) 
                 for col_b in zip_b] for row_a in a]
    
      return pd.DataFrame(data=result, index=df_a.index, columns=df_a.index)
    
    # Modify df_multiply for desired result
    def df_multiply_modified(df_a, df_b):
      '''
             Modified Matrix multiplication of values in two DataFrames to create desired result
             Returns a DataFrame whose index and
             column are from the df_a
      '''
      a = df_a.values
      b = df_b.values
      zip_b = zip(*b)
      zip_b = list(zip_b)
      
      # sum 1 when difference <= 1 and 
      # values are non-zero
      # i.e. ele_a and ele_b and abs(ele_a-ele_b) <=1
      result = [[sum(1 if ele_a and ele_b and abs(ele_a-ele_b) <=1 else 0 for ele_a, ele_b in zip(row_a, col_b)) 
                 for col_b in zip_b] for row_a in a]
    
      return pd.DataFrame(data=result, index=df_a.index, columns=df_a.index)
    

    Usage

    Original Multiplication

    df = pd.DataFrame({'Object':['A', 'B', 'C'],
                      'Property1':[1, 0, 1],
                      'Property2':[1, 1, 0],
                      'Property3':[1, 1, 0]})
    
    df.set_index('Object', inplace = True)
    print(df_multiply(df, df.T)
    # Output (same as [email protected]):
    Object  A  B  C
    Object         
    A       3  2  1
    B       2  2  0
    C       1  0  1 
    

    Modified Multiplication

    # Use df_multiply_modified
    df = pd.DataFrame({'Object':['A', 'B', 'C'],
                      'Property1':[3, 0, 2],
                      'Property2':[2, 2, 0],
                      'Property3':[1, 3, 0]})
    df.set_index('Object', inplace = True)
    print(df_multiply_modified(df, df.T)
    # Output (same as desired)
    Object  A  B  C
    Object         
    A       3  1  1
    B       1  2  0
    C       1  0  1