Search code examples
pythonpandasnumpy

HoW count sequences in Python


I have a series like below. The series is a dataframe. I convert to a series just to show here.

Responses = {N, N, N, N, N, S, N, N, N, N, N, N, S, N, N, N, N, N, N, N, N, N, N, N, N, N, S, S, S, S, S, S, N, N, S, N, N, N, S, S, S, S, N, S, N, N, N, N, N, N, N, S, S, N, N, N, S, S, S, N, S, S, S, S, S, S, S, S,N, N, N, N, N, S, N, N, N, N, N, N, S, N, N, N, N, N, N, N, N, N, N, N, N, N, S, S, S, S, S, S, N, N, S, N, N, N, S, S, S, S, N, S, N, N, N, N, N, N, N, S, S, N, N, N, S, S, S, N, S, S, S, S, S, S, S, S }

I use this code to count the largest sequence of 'S' (I know that this is not the best way):

import csv
import numpy as np
import pandas as pd

df = pd.read_csv('C:\\teste\\pergunta.csv')

count = 0
prev = 0
indexend = 0
indexcount = 0
for i in range(0,len(df)):
    if df['Responses'].loc[i] == 'S':
        count += 1
        indexcount = i
    else:
        if count > prev:
            prev = count
            indexend = i
        count = 0

if count > prev:
    prev = count
    indexend = indexcount

print("The longest sequence is "+str(prev))
count = 0
prev = 0
indexend = 0
indexcount = 0

The code says that the longest sequence os 'S' is 8. This is ok.

But, this sequence of 8 S happens twice.

My question: I need a code that counts individual sequences of S from the longest to the minimum.

Something like:

Sequence Frequency
8 2 times
7 1 times
6 3 times
5 1 times
4 x times
3 x times
2 x times

Can someone help me?

Thanks


Solution

  • Try this:

    s = pd.Series(Responses)
    s = s.loc[s.eq('S')].groupby(s.ne(s.shift()).cumsum()).count().value_counts()
    df = s.map('{} times'.format).rename_axis('Sequence').reset_index(name = 'Frequency')
    

    or

    m = df['Responses'].eq('N')
    
    df.groupby(m.cumsum().mask(m))['Responses'].count().value_counts().astype(str).add(' times').rename_axis('Sequence').reset_index(name = 'Frequency')
    

    Output:

       Sequence Frequency
    0         1   8 times
    1         6   2 times
    2         4   2 times
    3         2   2 times
    4         3   2 times
    5         8   2 times