Search code examples
javaoptaplanner

Optaplanner local search produces inconsistent state of InverseRelationShadowVariable annotated field


EDIT - turned out to be caused by inappropriate use of SolutionManager#explain(Solution_) - see the accepted answer

I am experiencing a bug in either my code or optaplanner and i can't seem to wrap my head around it.

Bug description

I am using "Chained through time" pattern with the following classes

  • Employee (extends TaskOrEmployee) - problem fact
  • Task (extends TaskOrEmployee) (contains @PlanningVariable TaskOrEmployee previousTaskOrEmployee field) (annotated by @PlanningEntity)
  • TaskOrEmployee (contains @InverseRelationShadowVariable Task nextTask field) (annotated by @PlanningEntity)

Optaplanner version: 8.44.0.Final, java 11

Construction heuristic is working fine, but local search is throwing an exception saying that the solution state is inconsistent, because task.getPreviousTaskOrEmployee().getNextTask() != task, even though I am not passing already initialized solution and I am never setting nextTask nor previousTaskOrEmployee manually.

Not knowing what to do I have even verified that the state of the solution produced by construction heuristic is consistent inside SolverManager#solveAndListen's bestSolutionConsumer.

Error stack trace:

2024-01-06 23:48:57 DEBUG   DefaultConstructionHeuristicPhase#stepEnded     CH step (1503), time spent (1256), score (0hard/-319262medium/494035696soft), selected move count (1), picked move (<263/22.6 [2]> {null -> <263/22.6 [1]>}).
2024-01-06 23:48:57 INFO    DefaultConstructionHeuristicPhase#phaseEnded    Construction Heuristic phase (0) ended: time spent (1266), best score (0hard/-319262medium/494035696soft), score calculation speed (1525/sec), step total (1504).
2024-01-06 23:48:57 ERROR   DefaultSolverManager#lambda$new$0   Solving failed for problemId (1).
java.lang.IllegalStateException: The entity (<276/22.1 [7]>) has a variable (previousTaskOrEmployee) with value (<276/22.1 [6]>) which has a sourceVariableName variable (nextTask) with a value (null) which is not that entity.
Verify the consistency of your input problem for that sourceVariableName variable.
    at org.optaplanner.core.impl.domain.variable.inverserelation.SingletonInverseVariableListener.retract(SingletonInverseVariableListener.java:76) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.inverserelation.SingletonInverseVariableListener.beforeVariableChanged(SingletonInverseVariableListener.java:32) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.listener.support.VariableChangedNotification.triggerBefore(VariableChangedNotification.java:15) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.listener.support.VariableChangedNotification.triggerBefore(VariableChangedNotification.java:6) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.listener.support.AbstractNotifiable.triggerBefore(AbstractNotifiable.java:70) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.listener.support.VariableListenerNotifiable.notifyBefore(VariableListenerNotifiable.java:26) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.domain.variable.listener.support.VariableListenerSupport.beforeVariableChanged(VariableListenerSupport.java:161) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.score.director.AbstractScoreDirector.beforeVariableChanged(AbstractScoreDirector.java:423) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.score.director.AbstractScoreDirector.changeVariableFacade(AbstractScoreDirector.java:435) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.heuristic.selector.move.generic.chained.ChainedChangeMove.doMoveOnGenuineVariables(ChainedChangeMove.java:60) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.heuristic.move.AbstractMove.doMoveOnly(AbstractMove.java:26) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.heuristic.move.AbstractMove.doMove(AbstractMove.java:20) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.heuristic.move.AbstractMove.doMove(AbstractMove.java:15) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.score.director.AbstractScoreDirector.doAndProcessMove(AbstractScoreDirector.java:212) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.doMove(LocalSearchDecider.java:117) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.decideNextStep(LocalSearchDecider.java:101) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:72) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:83) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:193) ~[optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at org.optaplanner.core.impl.solver.DefaultSolverJob.call(DefaultSolverJob.java:109) [optaplanner-core-impl-8.44.0.Final.jar:8.44.0.Final]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

By looking at this stack trace I can tell you that both mentioned entities (<276/22.1 [7]> and <276/22.1 [6]>) are instances of Task class and not Employee (the domain is stated below).

Domain:

Simplifying things: the business wants to assign tasks to employees. I've implemented "Chained through time" pattern as described in the documentation.

// solution class
@PlanningSolution
public class TaskAssigningSolution implements Serializable {

    @ValueRangeProvider(id = "employeeRange")
    @ProblemFactCollectionProperty
    List<Employee> employees;

    // When the initial (uninitialized) solution is created, these tasks are sorted in an order in which the construction heuristic should append them to the chain
    @ValueRangeProvider(id = "taskRange")
    @PlanningEntityCollectionProperty
    List<Task> tasks;

    @PlanningScore
    HardMediumSoftScore score;
}

// superclass to Employee and Task
@PlanningEntity
public abstract class TaskOrEmployee implements Serializable {

    // https://www.optaplanner.org/docs/optaplanner/latest/shadow-variable/shadow-variable.html#bidirectionalVariable
    @InverseRelationShadowVariable(sourceVariableName = "previousTaskOrEmployee")
    Task nextTask;
}

// problem fact, chain anchor
public class Employee extends TaskOrEmployee {

    @PlanningId
    String employeeId;

    // removed bunch of fields which are irrelevant to the problem - they're readonly and used only by StartAndEndTimeUpdatingPlanningVariableListener
}

@PlanningEntity
public class Task extends TaskOrEmployee {

    // https://www.optaplanner.org/docs/optaplanner/latest/design-patterns/design-patterns.html#chainedThroughTimePattern
    @PlanningVariable(valueRangeProviderRefs = { "employeeRange", "taskRange" }, graphType = CHAINED)
    TaskOrEmployee previousTaskOrEmployee;

    @AnchorShadowVariable(sourceVariableName = "previousTaskOrEmployee")
    Employee employee;

    @ShadowVariable(sourceVariableName = "previousTaskOrEmployee", variableListenerClass = StartAndEndTimeUpdatingPlanningVariableListener.class)
    Long startTime;

    @ShadowVariable(sourceVariableName = "previousTaskOrEmployee", variableListenerClass = StartAndEndTimeUpdatingPlanningVariableListener.class)
    Long endTime;

    @PlanningId
    String taskId;

    // removed bunch of fields which are irrelevant to the problem - they're readonly and used only by StartAndEndTimeUpdatingPlanningVariableListener
}

I don't think that posting the whole StartAndEndTimeUpdatingPlanningVariableListener is necessary, I can assure you that it is not altering nextTask field in any way. It is only setting these two fields:

scoreDirector.beforeVariableChanged(task, "startTime");
task.setStartTime(resultTuple.startTime);
scoreDirector.afterVariableChanged(task, "startTime");

scoreDirector.beforeVariableChanged(task, "endTime");
task.setEndTime(resultTuple.endTime);
scoreDirector.afterVariableChanged(task, "endTime");

Configuration:

<?xml version="1.0" encoding="UTF-8"?>
<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">

    <environmentMode>REPRODUCIBLE</environmentMode>
    <moveThreadCount>8</moveThreadCount>
    <solutionClass>package.TaskAssigningSolution</solutionClass>
    <entityClass>package.TaskOrEmployee</entityClass>
    <entityClass>package.Task</entityClass>
    <domainAccessType>REFLECTION</domainAccessType>

    <scoreDirectorFactory>
        <constraintProviderClass>package.TaskAssigningConstraintProvider</constraintProviderClass>
        <!--
            Same error happens on drools
        -->
        <constraintStreamImplType>BAVET</constraintStreamImplType>
    </scoreDirectorFactory>

    <constructionHeuristic>
        <changeMoveSelector>
        <!--
            This filter checks whether the achor of a chain that this move is trying to append the Task to (Employee)
            is capable of completing that Task (whether it has required skillset or not)
        -->
        <filterClass>package.TaskChangeMoveSelectionFilter</filterClass>
        <valueSelector>
            <!-- 
                This filter checks whether the value is an end of a chain (it's nextTask is equal to null).
                Otherwise construction heuristic would be running for hours on very large datasets.
            -->
            <filterClass>package.ConstructionHeuristicValueSelectionFilter</filterClass>
        </valueSelector>
        </changeMoveSelector>
    </constructionHeuristic>

    <localSearch>
        <localSearchType>TABU_SEARCH</localSearchType>
        <!--
            Single changeMoveSelector is not the end goal - I'm just trying to setup localSearch
            in a way which would not cause any errors, and adding more moveSelectors won't resolve those errors
        -->
        <changeMoveSelector>
            <filterClass>package.TaskChangeMoveSelectionFilter</filterClass>
        </changeMoveSelector>
    </localSearch>
</solver>

I've also tried different localSearch types (TABU_SEARCH, HILL_CLIMBING, LATE_ACCEPTANCE) - HILL_CLIMBING and LATE_ACCEPTANCE are usually able to step a few times (according to the debug logs) but they also fail with the same error eventually. As far as I understand the problem, this means that the solution is not yet corrupted after Phase 0 (Construction Heuristic) ends.


Solution

  • It turned out to be caused by something that I have omitted in the question itself thinking it is irrelevant.

    In the solveAndListen bestSolution callback I was calling ScoreManager#explain(Solution_) method. It's default implementation goes as follow:

    /**
     * Updates the given solution according to the {@link SolutionUpdatePolicy}.
     * ...
     */
    default ScoreExplanation<Solution_, Score_> explain(Solution_ solution) {
        return explain(solution, UPDATE_ALL);
    }
    

    This in conjunction with setting

    <moveThreadCount>8</moveThreadCount>
    

    Altered the solution while it was being used by another thread.

    Cloning the solution before calling explain solved the issue in my case (at least after the amount of testing I've done so far).

    final var cloner = new FieldAccessingSolutionCloner<>(
        SolutionDescriptor.buildSolutionDescriptor(
            TaskAssigningSolution.class, TaskOrEmployee.class, Task.class
        ));
    
    cloner.cloneSolution(newBestSolution);
    

    I still don't know whether I have resolved the issue or just treated the symptoms though, and because of that this answer will stay unaccepted for a while.