I have upgraded to Optaplanner 7.12, and in hunting for a potential issue with multithreading mixed with VariableListeners, noticed an oddity in reproducible execution: if the configuration file has <moveThreadCount>1</moveThreadCount>
, the execution is NOT the same as when the moveThreadCount line is absent, which seems unexpected to me as a user, and might be intertwined with the potential optaplanner race condition I am seeing (noted at the end of this post).
Code details
I observe this for a configuration file in REPRODUCIBLE mode with a fixed seed:
<environmentMode>REPRODUCIBLE</environmentMode>
<randomSeed>50</randomSeed>
The difference in Optaplanner behavior is visible during my use of a VariableListener. I have a set of custom MoveFactory classes for a nurse-rostering-derived model. Each custom factory generates a different set of moves for each STEP, and each decides on their moves based on a common set of state-dependent computationally intensive pre-calculations. I created a MoveFactoryHelper class that makes the precalculations, and then call the helper at the beginning of each custom MoveFactory's createMoveList method (I have not yet attempted to migrate to the newer Optaplanner iterative move generation option).
To avoid repeating the computations for each of the move factories, the MoveFactoryHelper stores its result for re-use, and decides when to re-compute based on a 'dirty' flag that is set in a VariableListener registered on a (completely unused) shadow to the model's PlanningEntity, and cleared by the MoveFactoryHelper when the re-computations occur:
ShiftAssignment.java
@PlanningEntity(movableEntitySelectionFilter = MovableShiftAssignmentSelectionFilter.class,
difficultyComparatorClass = ShiftAssignmentDifficultyComparator.class)
@XStreamAlias("ShiftAssignment")
public class ShiftAssignment extends AbstractPersistable {
...
@PlanningVariable(valueRangeProviderRefs = {"employeeRange"},
strengthComparatorClass = EmployeeStrengthComparator.class
)
private Employee employee;
...
@CustomShadowVariable( variableListenerClass=UpdatingEmployeeVariableListener.class,
sources={@PlanningVariableReference(variableName="employee", entityClass=ShiftAssignment.class)})
private Employee notifierEmployee; // TODO is there a better way to notify move factory of changes in problem facts?
UpdatingEmployeeVariableListener.java
private static final Logger logger = LoggerFactory.getLogger(UpdatingEmployeeVariableListener.class);
private static final boolean initiallyDirty = true;
private static Map<Thread, Boolean> employeeShiftAssignmentEntityDirty = new HashMap<Thread, Boolean>();
private static Map<Thread, Boolean> employeeShiftAssignmentMapDirty = new HashMap<Thread, Boolean>();
private static final boolean useThreadFlags = false;
// debug monitoring
private static Map<Thread, Integer> countDirtyAllFlags = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countBeforeEntityAdded = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countAfterEntityAdded = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countBeforeVariableChanged = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countAfterVariableChanged = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countBeforeEntityRemoved = new HashMap<Thread, Integer>();
private static Map<Thread, Integer> countAfterEntityRemoved = new HashMap<Thread, Integer>();
public UpdatingEmployeeVariableListener() {
// no action
}
private static Thread getActiveThread() {
return useThreadFlags ? Thread.currentThread() : null;
}
public static void setFlagsDirty() {
countDirtyAllFlags.put(getActiveThread(), 1+countDirtyAllFlags.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentEntityDirty.put(getActiveThread(), true);
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void beforeEntityAdded(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
countBeforeEntityAdded.put(getActiveThread(), 1+countBeforeEntityAdded.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void afterEntityAdded(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
countAfterEntityAdded.put(getActiveThread(), 1+countAfterEntityAdded.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void beforeVariableChanged(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector,
ShiftAssignment entity) {
countBeforeVariableChanged.put(getActiveThread(), 1+countBeforeVariableChanged.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void afterVariableChanged(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector,
ShiftAssignment entity) {
countAfterVariableChanged.put(getActiveThread(), 1+countAfterVariableChanged.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void beforeEntityRemoved(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
countBeforeEntityRemoved.put(getActiveThread(), 1+countBeforeEntityRemoved.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
@Override
public void afterEntityRemoved(@SuppressWarnings("rawtypes") ScoreDirector scoreDirector, ShiftAssignment entity) {
countAfterEntityRemoved.put(getActiveThread(), 1+countAfterEntityRemoved.getOrDefault(getActiveThread(), 0));
employeeShiftAssignmentMapDirty.put(getActiveThread(), true);
}
/**
* @return the employeeShiftAssignmentEntityDirty
*/
public static boolean isEmployeeShiftAssignmentEntityDirty() {
return employeeShiftAssignmentEntityDirty.getOrDefault(getActiveThread(), initiallyDirty);
}
/**
* clears isEntityDirty, implying that the (externally maintained) employee shift assignment entity list has been updated
*/
public static void clearEmployeeShiftAssignmentEntityDirty() {
employeeShiftAssignmentEntityDirty.put(getActiveThread(), false);
}
/**
* @return the mapDirty (which is depending also on entityDirty)
*/
public static boolean isEmployeeShiftAssignmentMapDirty() {
return employeeShiftAssignmentMapDirty.getOrDefault(getActiveThread(), initiallyDirty) || isEmployeeShiftAssignmentEntityDirty();
}
/**
* clears isMapDirty, implying that the (externally maintained) employee shift assignment map has been updated (as well as the underlying entity)
*/
public static void clearEmployeeShiftAssignmentMapDirty() {
clearEmployeeShiftAssignmentEntityDirty();
employeeShiftAssignmentMapDirty.put(getActiveThread(), false);
logger.debug("Clearing dirty flag: (AF={}, BEA={}, AEA={}, BVC={}, AVC={}, BER={}, AER={}) thread={}, employeeShiftAssignmentEntityDirty={}, employeeShiftAssignmentMapDirty={}",
countDirtyAllFlags.getOrDefault(getActiveThread(), 0),
countBeforeEntityAdded.getOrDefault(getActiveThread(), 0),
countAfterEntityAdded.getOrDefault(getActiveThread(), 0),
countBeforeVariableChanged.getOrDefault(getActiveThread(), 0),
countAfterVariableChanged.getOrDefault(getActiveThread(), 0),
countBeforeEntityRemoved.getOrDefault(getActiveThread(), 0),
countAfterEntityRemoved.getOrDefault(getActiveThread(), 0),
getActiveThread(),
employeeShiftAssignmentEntityDirty,
employeeShiftAssignmentMapDirty);
clearCounts();
}
private static void clearCounts() {
countDirtyAllFlags.put(getActiveThread(), 0);
countBeforeEntityAdded.put(getActiveThread(), 0);
countAfterEntityAdded.put(getActiveThread(), 0);
countBeforeVariableChanged.put(getActiveThread(), 0);
countAfterVariableChanged.put(getActiveThread(), 0);
countBeforeEntityRemoved.put(getActiveThread(), 0);
countAfterEntityRemoved.put(getActiveThread(), 0);
}
}
(note that the maps-of-booleans and maps-of-integers are effectively single booleans and integers here, because the Thread in the map lookup is currently always null due to the final useThreadFlags=false
)
I confirmed that only the MoveFactory objects call the MoveFactoryHelper. Similarly, other than the VariableListener annotation above, and the flag queries/clear from MoveFactoryHelper, the only other calls to the UpdatingEmployeeVariableListener are to call UpdatingEmployeeVariableListener.setFlagsDirty()
just before solving is started:
@Override
public void actionPerformed(ActionEvent e) {
UpdatingEmployeeVariableListener.setFlagsDirty();
setSolvingState(true);
Solution_ problem = solutionBusiness.getSolution();
new SolveWorker(problem).execute();
}
and just after solving is stopped:
solver.terminateEarly();
UpdatingEmployeeVariableListener.setFlagsDirty();
The template of maps-by-thread is new, but the underlying use of boolean flags has been executing for years successfully:
true
. MoveFactoryHelper re-computes based on the current state and then calls to clear the dirty flagsLog details showing odd Optaplanner behavior
With the upgrade to 7.12, and without a moveThreadCount configuration line, the code continues to run correctly and reproducibly when I do not have the moveThreadCount xml element defined:
11:20:37.274 INFO Solving started: time spent (422), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
11:20:37.280 DEBUG CH step (0), time spent (428), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
11:20:37.280 INFO Construction Heuristic phase (0) ended: time spent (428), best score (0hard/-5340soft), score calculation speed (1000/sec), step total (1).
11:20:37.561 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:44.303 DEBUG LS step (0), time spent (7451), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
11:20:44.310 DEBUG Factories(10) STEP moves: 1594020
11:20:44.312 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13800, AVC=13800, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:46.609 DEBUG LS step (1), time spent (9757), score (0hard/-5266soft), best score (0hard/-4919soft), accepted/selected move count (1/24), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
11:20:46.610 DEBUG Factories(10) STEP moves: 473969
11:20:46.613 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=746, AVC=746, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
11:20:48.124 DEBUG LS step (2), time spent (11272), score (0hard/-5083soft), best score (0hard/-4919soft), accepted/selected move count (1/110), picked move ((CloseSlack-newEmplS) 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
11:20:48.124 DEBUG Factories(10) STEP moves: 477083
(the Factories debug log line after each step is just to show how many moves the 10 custom factories offered up to the solver in that previous step)
However, when I add the <moveThreadCount>1</moveThreadCount>
line to the config file, then I see intermittent calls to rebuild the MoveFactoryHelper in the middle of Optaplanner making variable changes (see LS step 2, below):
10:46:05.413 INFO Solving started: time spent (360), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
10:46:05.746 DEBUG CH step (0), time spent (693), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
10:46:05.746 INFO Construction Heuristic phase (0) ended: time spent (693), best score (0hard/-5340soft), score calculation speed (9/sec), step total (1).
10:46:05.949 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:13.014 DEBUG LS step (0), time spent (7961), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
10:46:13.019 DEBUG Factories(10) STEP moves: 1594020
10:46:13.021 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13844, AVC=13844, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:14.741 DEBUG LS step (1), time spent (9688), score (0hard/-5266soft), best score (0hard/-4919soft), accepted/selected move count (1/19), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
10:46:14.741 DEBUG Factories(10) STEP moves: 473969
10:46:14.743 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=582, AVC=582, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:14.743 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=20, AVC=20, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
10:46:16.444 DEBUG LS step (2), time spent (11391), score (0hard/-5083soft), best score (0hard/-4919soft), accepted/selected move count (1/97), picked move ((CloseSlack-newEmplS) {...} 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
10:46:16.445 DEBUG Factories(10) STEP moves: 1580032
Two comments: First, there is some loss of reproducible execution, for example notice there are 13800 before/after variable changes originally, and 13844 now. I am presuming this is related to the "enabling" of multithreading, even though only one thread will be used.
Second, the number of variable changes and the details of the "split", where two calls are seen to clear the dirty flags (after rebuilding MoveFactoryHelper) has some variation from run to run, leading me to think it is a multithread race issue, for example:
12:16:27.712 INFO Solving started: time spent (375), best score (0hard/-5340soft), environment mode (REPRODUCIBLE), random (JDK with seed 50).
12:16:28.043 DEBUG CH step (0), time spent (706), score (0hard/-5340soft), selected move count (1), picked move ((NullEmployee-nochange) 2018-12-25/D/0 {...}).
12:16:28.043 INFO Construction Heuristic phase (0) ended: time spent (706), best score (0hard/-5340soft), score calculation speed (9/sec), step total (1).
12:16:28.288 DEBUG Clearing dirty flag: (AF=1, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:35.148 DEBUG LS step (0), time spent (7811), score (0hard/-4919soft), new best score (0hard/-4919soft), accepted/selected move count (1/300), picked move ([(WeekAlign-f) {...}, (WeekAlign-f) {...}]).
12:16:35.158 DEBUG Factories(10) STEP moves: 1594020
12:16:35.160 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=13821, AVC=13821, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:35.160 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=0, AVC=0, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:37.050 DEBUG LS step (1), time spent (9713), score (0hard/-5266soft), best score (0hard/-4919soft), accepted/selected move count (1/22), picked move ((SlidePair) 2019-06-04/1/0... 1 shifts {...} <-slide-> {...} 3 shifts ...2019-06-07/1/0).
12:16:37.053 DEBUG Factories(10) STEP moves: 1576812
12:16:37.054 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=763, AVC=763, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:37.055 DEBUG Clearing dirty flag: (AF=0, BEA=0, AEA=0, BVC=23, AVC=23, BER=0, AER=0) thread=null, employeeShiftAssignmentEntityDirty={null=false}, employeeShiftAssignmentMapDirty={null=false}
12:16:39.414 DEBUG LS step (2), time spent (12077), score (0hard/-5083soft), best score (0hard/-4919soft), accepted/selected move count (1/98), picked move ((CloseSlack-newEmplS) {...} 2019-05-28/D/2(7 shifts) <-swap-> {...} 2019-05-21/D/3(7 shifts)).
12:16:39.414 DEBUG Factories(10) STEP moves: 1580534
Thus my two questions:
Is it right that Optaplanner does not behave the same with no moveThreadCount definition versus a definition of 1? That seems unexpected for a user.
What might I, or Optaplanner, be doing that is causing the early call to a custom MoveFactory (to generate Move lists) before all Optaplanner's variable changes are complete after the previous step, even in a single-thread configuration? I'm wondering if the "chosen move" is implemented and new createMoveList calls begin before the last of the move-scoring/testing threads from the "last" move are all paused. Even if so, I don't know why that would result in non-reproducible execution here, unless the still-running threads are making random move selections (which would seem to yield a non-reproducible execution overall).
This is occurring in both "run" and "debug" execution environments.
Thank you.
Appears resolved by Optaplanner 7.15.0. Updating to this release will fix the problem.