Making 4-gram data by shifting 2 units of data at a time

I have a sequence of data that I wish to build n-grams from it. An excerpt of a sequence look is as follows. 8c b0 00 f0 05 fc 04 46 00 f0 fe fb 40 f2 00 05 c2 f2 00 05 28 78 00 I currently uses ntlk's ngrams() function to build 4-grams from this data as 8c b0 00 f0, b0 00 f0 05,00 f0 05 fc...etc. which is just creating 4-grams by sliding one by one. However, my requirement is instead of sliding one by one, I need to slide two by two, while creating the n-grams. So the expected out 8c b0 00 f0, 00 f0 05 fc,05 fc 04 46...etc. I searched but could not find any way to do this instead of shifting one by one as I currently have. following is a part of the 4 line code that emphasis the current work

 s = finalString.lower()
 s = re.sub(r'[^a-zA-Z0-9\s]', ' ', s)
 tokens = [token for token in s.split(" ") if token != ""]
 output = list(ngrams(tokens, 4))

Solution

You can do the following trick,

s = '8c b0 00 f0 05 fc 04 46 00 f0 fe fb 40 f2 00 05 c2 f2 00 05 28 78 00'

from nltk import ngrams
output = list(ngrams(s.split(), 4))[::2]     # Using only alternate records from ngrams,
                                             # Here 2 is the sliding window that you want.

Output:

[('8c', 'b0', '00', 'f0'), ('00', 'f0', '05', 'fc'), ('05', 'fc', '04', '46'), ('04', '46', '00', 'f0'), ('00', 'f0', 'fe', 'fb'), ('fe', 'fb', '40', 'f2'), ('40', 'f2', '00', '05'), ('00', '05', 'c2', 'f2'), ('c2', 'f2', '00', '05'), ('00', '05', '28', '78')]