Search code examples
pythonnumpynlpnltk

Ngram in python with start_pad


i'm know in python i'm take some basic thing about list and tuple but my not full understand the my cod i want create list have three index in each index have tuple with tow index like this [('~','a'),('a','b'),('b','c')] the first index in tuple have tow char or the length context when have like this [('~a','a'),('ab','b'),('bc',' c')] can any one help my ? Her my code


def getNGrams(wordlist, n):
ngrams = []
padded_tokens = "~"*(n) + wordlist
t = tuple(wordlist)
for i in range(3):
  t = tuple(padded_tokens[i:i+n])
  ngrams.append(t)
return ngrams

Solution

  • IIUC, You can change the function like below and get what you want:

    def getNGrams(wordlist, n):
        ngrams = []
        padded_tokens = "~"*n + wordlist
        for idx, i in enumerate(range(len(wordlist))):
            t = tuple((padded_tokens[i:i+n], wordlist[idx]))
            ngrams.append(t)
        return ngrams
    
    print(getNGrams('abc',1))
    print(getNGrams('abc',2))
    print(getNGrams('abc',3))
    

    Output:

    [('~', 'a'), ('a', 'b'), ('b', 'c')]
    [('~~', 'a'), ('~a', 'b'), ('ab', 'c')]
    [('~~~', 'a'), ('~~a', 'b'), ('~ab', 'c')]