Search code examples
pythonstringfrequency

Python - Count most frequent elements in list of same length


I've been looking for answers to this for the last few hours, without finding the answer I was looking for, so I decided to ask here instead.

So, say I've got a list of data with the same length such as;

0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30

What would be the most efficient way to match the most occurring characters in each position.

An example output would be something like;

0 0 0 4 0 0 0 0 0 3 3 3 3



Edit: Thanks for the answers, gave me just what I was looking for! :)



Edit 2: Thought I'd add to the question as it may be the easiest way to figure it out. With the suggested answers, how would you go about adding a total count, as well as having some sort of percentage? Since it's a large set of data, the most common occurrences alone isn't as clear as I was hoping it would of been.


Solution

  • zip the list of strings to "transpose" them to present columns in the same iterator, apply collections.Counter on them, and use most_common method, remove the unwanted data

    data="""0004000000350
    0000090033313
    0004000604363
    040006203330b
    0004000300a3a
    0004000403833
    00000300333a9
    0004000003a30"""
    
    import collections
    
    counts = [collections.Counter(x).most_common(1)[0][0] for x in zip(*data.splitlines())]
    

    this yields:

    ['0', '0', '0', '4', '0', '0', '0', '0', '0', '3', '3', '3', '3']
    

    (join the characters to recreate a string if needed using "".join(counts))