I've been looking for answers to this for the last few hours, without finding the answer I was looking for, so I decided to ask here instead.
So, say I've got a list of data with the same length such as;
0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30
What would be the most efficient way to match the most occurring characters in each position.
An example output would be something like;
0 0 0 4 0 0 0 0 0 3 3 3 3
Edit: Thanks for the answers, gave me just what I was looking for! :)
Edit 2: Thought I'd add to the question as it may be the easiest way to figure it out. With the suggested answers, how would you go about adding a total count, as well as having some sort of percentage? Since it's a large set of data, the most common occurrences alone isn't as clear as I was hoping it would of been.
zip the list of strings to "transpose" them to present columns in the same iterator, apply collections.Counter
on them, and use most_common
method, remove the unwanted data
data="""0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30"""
import collections
counts = [collections.Counter(x).most_common(1)[0][0] for x in zip(*data.splitlines())]
this yields:
['0', '0', '0', '4', '0', '0', '0', '0', '0', '3', '3', '3', '3']
(join the characters to recreate a string if needed using "".join(counts)
)