Search code examples
javajava-8java-streamn-gram

Mapping a stream of tokens to a stream of n-grams in Java 8


I think this is a fairly basic question concerning Java 8 streams, but I have a difficult time thinking of the right search terms. So I am asking it here. I am just getting into Java 8, so bear with me.

I was wondering how I could map a stream of tokens to a stream of n-grams (represented as arrays of tokens of size n). Suppose that n = 3, then I would like to convert the following stream

{1, 2, 3, 4, 5, 6, 7}

to

{[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]}

How would I accomplish this with Java 8 streams? It should be possible to compute this concurrently, which is why I am interested in accomplishing this with streams (it also doesn't matter in what order the n-arrays are processed).

Sure, I could do it easily with old-fashioned for-loops, but I would prefer to make use of the stream API.


Solution

  • Such an operation is not really suited for the Stream API. In the functional jargon, what you're trying to do is called a sliding window of size n. Scala has it built-in with the sliding() method, but there is nothing built-in in the Java Stream API.

    You have to rely on using a Stream over the indexes of the input list to make that happen.

    public static void main(String[] args) {
        List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7);
        List<List<Integer>> result = nGrams(list, 3);
        System.out.println(result);
    }
    
    private static <T> List<List<T>> nGrams(List<T> list, int n) {
        return IntStream.range(0, list.size() - n + 1)
                        .mapToObj(i -> new ArrayList<>(list.subList(i, i + n)))
                        .collect(Collectors.toList());
    }
    

    This code simply makes a Stream over the indexes of the input list, maps each of them to a new list that is the result of getting the values of the list from i to i+n (excluded) and collect all that into a List.