Conflating Java streams

I have very big Stream of versioned documents ordered by document id and version.

E.g. Av1, Av2, Bv1, Cv1, Cv2

I have to convert this into another Stream whose records are aggregated by document id.

A[v1, v2], B[v1], C[v1, V2]

Can this be done without using Collectors.groupBy()? I don't want to use groupBy() because it will load all items in the stream into memory before grouping them. In theory, one need not load the whole stream in memory because it is ordered.

Solution

You can use groupRuns in the StreamEx library for this:

class Document {
    public String id;
    public int version;
    public Document(String id, int version) {
        this.id = id;
        this.version = version;
    }
    public String toString() {
        return "Document{"+id+version+ "}";
    }
}

public class MyClass {
    private static List<Document> docs = asList(
        new Document("A", 1),
        new Document("A", 2),
        new Document("B", 1),
        new Document("C", 1),
        new Document("C", 2)
    );

    public static void main(String args[]) {
        StreamEx<List<Document>> groups = StreamEx.of(docs).groupRuns((l, r) -> l.id.equals(r.id));
        for (List<Document> grp: groups.collect(toList())) {
            out.println(grp);
        }
    }
}

which outputs:

[Document{A1}, Document{A2}]
[Document{B1}]
[Document{C1}, Document{C2}]

I can't verify this doesn't consume the entire stream, but I cannot imagine why it would need to given what groupRuns is meant to do.