Search code examples
pythonjsonpandaslistfind-occurrences

Find and count all occurrences and position of numbers in a range in a list


I want to find the number of times each number appears at each index position in a list of 6 number sets when I do not know what the numbers will be, but they will range from 0-99 only.

Example list:

data = [['22', '45', '6', '72', '1', '65'], ['2', '65', '67', '23', '98', '1'], ['13', '45', '98', '4', '12', '65']]

Eventually I will be putting the resulting counts into a pandas DataFrame to look something like this:

num numofoccurances position numoftimesinposition
01         02            04            01
01         02            05            01
02         01            00            01
04         02            03            01
06         01            02            01
12         01            04            01
13         01            00            01
and so on...

The resulting data will be a little different due to the num repeating for each time it appears in a different index position, but hopefully this helps you understand what I'm looking for.

So far, this is what I've started:

data = json.load(f)
numbers = []
contains = []

'''
This section is simply taking the data from the json file and putting it all into a list of lists containing the 6 elements I need in each list
'''
for i in data['data']:
    item = [i[9], i[10]]
#   print(item)
    item = [words for segments in item for words in segments.split()]
    numbers.append(item)

'''
This is my attempt to count to number of occurrences for each number in the range then add it to a list.
'''
x = range(1,99)
for i in numbers:
    if x in i and not contains:
        contains.append(x)

Solution

  • import pandas as pd
    num_pos = [(num,pos) for i in data for pos,num in enumerate(i)]
    df = pd.DataFrame(num_pos,columns = ['number','position']).assign(numoftimesinposition = 1)
    df = df.astype(int).groupby(['number','position']).count().reset_index()
    
    df1 = df.groupby('number').numoftimesinposition.sum().reset_index().\
        rename(columns = {'numoftimesinposition':'numofoccurences'}).\
        merge(df, on='number')
    
    print(df1)
        number  numofoccurences  position  numoftimesinposition
    0        1                2         4                     1
    1        1                2         5                     1
    4        2                1         0                     1
    7        4                1         3                     1
    9        6                1         2                     1
    2       12                1         4                     1
    3       13                1         0                     1
    5       22                1         0                     1
    6       23                1         3                     1
    8       45                2         1                     2
    10      65                3         1                     1
    11      65                3         5                     2
    12      67                1         2                     1
    13      72                1         3                     1
    14      98                2         2                     1
    15      98                2         4                     1
    

    if the code above feels slow, then use Counter from collections:

    import pandas as pd
    from collections import Counter
    
    num_pos = [(int(num),pos) for i in data for pos,num in enumerate(i)]
    
    count_data = [(num,pos,occurence) for (num,pos), occurence in Counter(num_pos).items()]
    
    df = pd.DataFrame(count_data, columns = ['num','pos','occurence']).sort_values(by='num')
    
    df['total_occurence'] = [Counter(df.num).get(num) for num in df.num]
    print(df)