In two weeks, I'll be partaking in a contest where people have to travel the most kilometers by train in The Netherlands. Everyone has 24 hours to travel, and the person with the longest distance travelled wins. However, you can only travel along each 'section' once. For example, if you travel from Rotterdam to Amsterdam and back from Amsterdam to The Hague, a big part will not count since you've already been there. If you come along the same section twice, your kilometers will not count. To get the optimal itinerary, I want to use the power of algorithms :).
In order to find the best route, I decided to use Python and use the networkx
package to get a visualisation of the Dutch railways. So far so good, but now comes the fun part: the algorithm. Given a graph with all railway sections and distances, how can you solve the problem? Here is the graph, without distances.
It seems to me like this is a combination of the Travelling Salesman Problem (except you can visit cities multiple times), a Maximum Flow optimization, and some kind of inverted Dijkstra algorithm :p. Is there an existing algorithm that can solve this? Or will I need to construct something myself? If the latter, is something like backtracking a good approach?
I think the first thing to note is that classical graph-based algorithms like longest path won't really apply here because of the time table, so I would frame this as a mixed integer linear programming problem. You would define two types of variables:
x_t
for whether a given train trip t
(identified by source c1
, destination c2
, start time t1
, and end time t2
) are used.y_s
for whether a given segment s
(identified by source c1
and destination c2
) is used (either once or multiple times) during the day.The objective of the optimization problem is to maximize the sum of segment distances d_s
times the indicator for whether that segment is used y_s
across all segments s
. Because the segment indicator never exceeds 1 (even if we used it many times), this handles the "double counting segments" concern.
The first type of constraints you need ensure that we actually take a valid trip. For any given train trip t
from source c1
starting at time t1
, the trip can be taken (x_t = 1
) only if the number of prior trips into c1
minus the number of prior trips out of c1
is equal to 1. This ensures that we are actually at c1
when we take a train from c1
to somewhere else. If the set of all prior train trips into c1
is i1, i2, ..., in
and the set of all prior train trips out of c1
is o1, o2, ..., om
, then this constraint is x_t <= x_i1 + x_i2 + ... + x_in - x_o1 - x_o2 - ... - x_om
. Note that this creates a tricky situation because we won't have a trip into the starting city of the entire itinerary. Therefore we'll create a fake starting city S
and trips (of distance 0) from S
to each other city at a time before all the other trips. If we call the trips from S
to each other city s1, s2, ..., sn
, then we'll add constraint x_s1 + x_s2 + ... + x_sn = 1
so we have a single starting city for our itinerary.
The other constraints we need ensure that the y_s
variables are set properly. In particular, we need to ensure y_s
is set to 0 if none of the trips t1, t2, ..., t_n
that use segment s
are used throughout the day. This can be accomplished with y_s <= x_t1 + x_t2 + ... + x_tn
.
You can implement this in a surprisingly readable and straightforward way with the pulp package in Python. I'll use dists
to indicate segments and their lengths and trains
to indicate all trains (tuples with start location, end location, start time, end time). By inspection, for this network we would expect just to go D to E for distance 3:
import pulp
dist = {("A", "B"): 1.0,
("B", "C"): 1.0,
("D", "E"): 3.0}
trains = [("A", "B", 1.0, 2.0),
("B", "C", 2.0, 3.0),
("C", "B", 3.5, 4.5),
("B", "A", 4.5, 5.5),
("D", "E", 1.0, 5.5)]
sources = set(list([t[0] for t in trains]))
x = pulp.LpVariable.dicts("x", trains, lowBound=0, upBound=1, cat=pulp.LpInteger)
y = pulp.LpVariable.dicts("y", dist.keys(), lowBound=0, upBound=1, cat=pulp.LpInteger)
s = pulp.LpVariable.dicts("s", sources, lowBound=0, upBound=1, cat=pulp.LpInteger)
mod = pulp.LpProblem("Train Optimization", pulp.LpMaximize)
# Objective
mod += sum([dist[k] * y[k] for k in dist])
# Feasibility
for t in trains:
mod += x[t] <= s[t[0]] + sum([x[k] for k in trains if k[1] == t[0] and k[3] <= t[2]]) - sum([x[k] for k in trains if k != t and k[0] == t[0] and k[2] <= t[2]])
mod += sum([s[k] for k in sources]) == 1
# Valid y variables
for k in dist:
mod += y[k] <= sum([x[t] for t in trains if (t[0] == k[0] and t[1] == k[1]) or (t[1] == k[0] and t[0] == k[1])])
# Solve
mod.solve()
for t in trains:
print "Trip", t, "used:", x[t].value()
As expected, we get:
Trip ('A', 'B', 1.0, 2.0) used: 0.0
Trip ('B', 'C', 2.0, 3.0) used: 0.0
Trip ('C', 'B', 3.5, 4.5) used: 0.0
Trip ('B', 'A', 4.5, 5.5) used: 0.0
Trip ('D', 'E', 1.0, 5.5) used: 1.0
We could double the distances for A-B and B-C:
dist = {("A", "B"): 2.0,
("B", "C"): 2.0,
("D", "E"): 3.0}
And now, as expected, we start taking the A-B-C loop (whether we take the C-B and B-A return trips doesn't affect the objective so the optimization engine can decide either way whether to take them):
Trip ('A', 'B', 1.0, 2.0) used: 1.0
Trip ('B', 'C', 2.0, 3.0) used: 1.0
Trip ('C', 'B', 3.5, 4.5) used: 1.0
Trip ('B', 'A', 4.5, 5.5) used: 0.0
Trip ('D', 'E', 1.0, 5.5) used: 0.0