I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}
Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby
in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips
& Routes
.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...)
is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray()
and I've put the orderby
right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.