Search code examples
pythonlistredundancy

Python: Calculate Redundancy Rate of a List


I am trying to gauge the redundancy rate of a list.

Let's assume:

L = [a, a, a, a] => redundancy rate = 1

L = [a, b, c, d] => redundancy rate = 0

L = [a, a, b, b] => redundancy rate = 0.5

I couldn't end up with a meaningful way to do so.


Solution

  • Although the output matches the values in the problem description, I'm not quite sure if this is a valid measure. Maybe min is better than mean.

    import pandas as pd
    l1 = ['a', 'a', 'a', 'a']
    l2= ['a', 'b', 'c', 'd']
    l3 = ['a', 'a', 'b', 'b']
    
    def f(l):
        s = pd.Series(l)
        ratio = s.value_counts() / len(l)
        redundantContent = s[s.duplicated(keep='first')]
        if not redundantContent.empty:
            return redundantContent.map(ratio).mean()
        else:
            return 0
    
    print("redundancy rate of l1: {}".format(f(l1)))
    print("redundancy rate of l2: {}".format(f(l2)))
    print("redundancy rate of l3: {}".format(f(l3)))
    
    

    Output

    redundancy rate of l1: 1.0
    redundancy rate of l2: 0
    redundancy rate of l3: 0.5