Looking for better ways to generate a list of edges for a graph in jqwik property testing framework

Currently I am using:

    @Provide
    Arbitrary<List<Tuple.Tuple3<Integer,Integer,Integer>>> edgeLists (
        TypeUsage type, ArbitraryProvider.SubtypeProvider subtype) {
        int vertices = 10;
        int degree_min = 1;
        int degree_max = 4;
        int min_edge_flow = 1;
        int max_edge_flow = 10;

        for (Annotation a : type.getAnnotations()) {
            if (a instanceof MaxFlowParameters) {
                MaxFlowParameters params = (MaxFlowParameters) a;
                vertices = Math.max(1, params.vertices());
                degree_min = Math.max(1, params.degree_min());
                degree_max = Math.min(vertices, Math.max(degree_min, params.degree_max()));
                min_edge_flow = Math.min(vertices, Math.max(0, params.min_edge_flow()));
                max_edge_flow = Math.min(vertices, Math.max(min_edge_flow, params.max_edge_flow()));
            }
        }

        Function<List<Integer>,List<Integer>> expand = new Function<List<Integer>,List<Integer>> () {
            @Override
            public List<Integer> apply (List<Integer> t) {
                List<Integer> result = new ArrayList<>();
                ListUtils.enumerate(t, (idx,copies) -> {
                    while (copies-- > 0) result.add(idx+1);
                    return true;
                });
                return result;
            }
        };

        int num_vertices = vertices;
        int the_min_edge_flow = min_edge_flow;
        int the_max_edge_flow = max_edge_flow;

        return Arbitraries.integers().between(degree_min, degree_max).list().ofSize(vertices).map(expand)
            .flatMap(sources -> Arbitraries.integers().between(1, num_vertices).list().ofSize(sources.size())
                .flatMap(targets -> Arbitraries.integers().between(the_min_edge_flow, the_max_edge_flow).list().ofSize(sources.size())
                    .map(flows -> {
                        int limit = sources.size();
                        List<Tuple3<Integer,Integer,Integer>> result = new ArrayList<>(limit);
                        for (int i = 0; i < limit; i++) {
                            result.add(Tuple.of(sources.get(i), targets.get(i), flows.get(i)));
                        }
                        return result;
                    })));
    }

    @Provide
    Arbitrary<Graph<String,IntegerFlow>> graphs (TypeUsage type, ArbitraryProvider.SubtypeProvider subtype) {
        return Combinators.withBuilder(() -> new GraphBuilder())
            .use(this.edgeLists(type, subtype)).in((builder, edges) -> builder.withEdges(edges))
            .build(builder -> builder.build());
    }

    @Property
    void searchOrdersEqual (
        @ForAll @From("edgeLists") List<Tuple.Tuple3<Integer,Integer,Integer>> edgeList,
        @ForAll Random random) {
        // for current in memory graph impl the search order in which augmenting paths are found will change
        // if the order the edges are declared in changes. so if we see that one search order does not
        // yield the same result as another, that the algo can not always be finding the max flow. if search
        // orders return the same result, its still not guaranteed its finding max-flow, that will
        // require additional tests. if this test fails, however, we definitely know that the algo is not
        // always finding max flow.
        int last = -1;
        for (int i = 0; i < 10; i++) {
            Collections.shuffle(edgeList, random);
            Graph<String,IntegerFlow> graph = new GraphBuilder().withEdges(edgeList).build();
            int next = new FordFulkerson<>(graph, graph.get(0), graph.get(graph.vertexCount()-1)).maxflow();
            if (last < 0) last = next;
            Assertions.assertThat(next).isEqualTo(last);
        }
    }

    @Property
    void validMinCutCandidate (@ForAll @From("graphs") Graph<String,IntegerFlow> graph) {
        // given the testing constraints we are not going to find the actual min-cut, as that would involve
        // re-implementation in some form. however we can check if its possible that there is a valid min-cut
        // very easily. if we find that its not even possible that a valid min-cut is specified by a solution
        // we know that the algorithm can not be finding the true max-flow.
        Vertex<String,IntegerFlow> source = graph.get(0);
        Vertex<String,IntegerFlow> sink = graph.get(graph.vertexCount() - 1);
        MaxIntegerFlow<String,IntegerFlow> algorithm = new FordFulkerson<>(graph, source, sink);

        int flow = algorithm.maxflow();

        int possibleCut = 0;
        for (Vertex<String,IntegerFlow> vertex : graph.vertices()) {
            if (vertex == sink) continue;
            for (Traverser<Edge<String,IntegerFlow>> trav = vertex.outgoing(); trav.moveNext();) {
                if (trav.get().label().available() == 0) {
                    possibleCut += trav.get().label().flow();
                }

            }
        }

        Assertions.assertThat(possibleCut).isGreaterThanOrEqualTo(flow);
    }

in this case I am just specifying source/target vertices by id and am adding a flow component (which could be a weight or any number of other associated values). the scheme is to make a list of degree values from [degree_min,degree_max], one for each vertex, then expand that list to a list where each source is repeated degree times. once I have that list I can generate sequences of targets and labels and combine to make edges.

this is sufficient to guarantee that I have a complete list of vertices and that there are an appropriate number of outgoing edges for each vertex. however I don't see this kind of approach scaling well to adding more realistic/useful constraints. Particularly given the extra filtering and mapping steps that would likely take, and as it stands there is probably too much of that already...

For example, I think that being able to make an arbitrary for each node's edges and then join the arbitraries to make the overall list of edges might help, but I don't see any way to do that within the framework (e.g. Combine is oriented towards combining values taken from each of several lists, not joining the lists).

Looking for any suggestions to improve this.

Solution

Since your example cannot easily be reproduced (some dependencies are missing) I tried to understand from reading the code what kind of graphs you are creating. I probably have missed something.

Here's a simple approach I came up with:

@Provide
Arbitrary<Set<Tuple2<String, Set<Tuple2<String, Integer>>>>> nodes() {
    int maxVertices = 20;
    int degreeMax = 4;
    int minEdgeFlow = 1;
    int maxEdgeFlow = 10;

    Arbitrary<String> anyVertix = Arbitraries.strings().withCharRange('a', 'z').ofLength(3);
    SetArbitrary<String> anyVertices = anyVertix.set().ofMinSize(1).ofMaxSize(maxVertices);

    return anyVertices.flatMapEach((vertices, vertix) -> {

        // Single vertix is a special case
        if (vertices.size() <= 1) {
            return Arbitraries.just(Tuple.of(vertix, Collections.emptySet()));
        }

        Set<String> possibleTargetVertices = new HashSet<>(vertices);
        possibleTargetVertices.remove(vertix);

        Arbitrary<Integer> anyEdgeFlow = Arbitraries.integers().between(minEdgeFlow, maxEdgeFlow);
        Arbitrary<Tuple2<String, Integer>> anyConnection =
            Combinators.combine(Arbitraries.of(possibleTargetVertices), anyEdgeFlow).as(Tuple::of);

        SetArbitrary<Tuple2<String, Integer>> anyConnections = anyConnection.set().ofMaxSize(degreeMax);

        return anyConnections.map(connections -> Tuple.of(vertix, connections));
    });
}

@Property(tries = 100)
@Report(Reporting.GENERATED)
@StatisticsReport(label = "count nodes", format = NumberRangeHistogram.class)
@StatisticsReport(label = "max degree", format = Histogram.class)
void checkNodes(@ForAll("nodes") Set<Tuple2<String, Set<Tuple2<String, Integer>>>> nodes) {
    Statistics.label("count nodes").collect(nodes.size());

    int maxDegree = nodes.stream().mapToInt(node -> node.get2().size()).max().orElse(0);
    Statistics.label("max degree").collect(maxDegree);
}

This provider method does not generate a list of edges but a set of vertices with their connections and edge flow per connection. One can be transformed into the other, of course.

The thing I strive for in generators is to minimize the amount of flat mapping because flat mapping often makes shrinking harder.

That said, graph generation is a topic you can get a PHD on, and some poeple have. There are a few standard ways to generate graphs (e.g. https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model) and a lot of more sophisticated ways. You may want to check a few other SO answers and resources:

Translating those approaches into jqwik code is sometimes easy, sometimes more involved.

P.S. TypeUsage type, ArbitraryProvider.SubtypeProvider subtype parameters are optional. Only add the ones you use in the method.