I run a website that allows users to book ferry tickets. We keep local inventory of ferry schedules in our DB. The way it works is that we have code that fetches schedule data from ferry operators' systems via API calls and the fetched data is then parsed, the code then does some checks with our local inventory in our DB i.e it reads data from our schedules DB and finally updates (writes to) the DB with updated/new schedules that have been fetched and parsed.
Now, this happens numerous times during the day. We fetch schedules in slots (what I'm referring to as the CRON) at recurring intervals during the day (as schedules on the operator's end can change multiple times during the day) in the following manner:
Day 1-5 (1 hr time interval)
Day 6-10 (1 hr time interval)
Day 11-15 (1 hr time interval)
Day 16-20 (2 hrs time interval)
Day 20-30 (2 hrs time interval)
Day 31-40 (4 hrs time interval)
Day 41-50 (4 hrs time interval)
Day 51-60 (4 hrs time interval)
Day 61-70 (8 hrs time interval)
Day 71-80 (8 hrs time interval)
Day 81-90 (8 hrs time interval)
Day 91-100 (8 hrs time interval)
Day 101-110 (8 hrs time interval)
Day 111-120 (8 hrs time interval)
Day 121-130 (8 hrs time interval)
Day 131-140 (8 hrs time interval)
Day 141-150 (8 hrs time interval)
Day 151-160 (8 hrs time interval)
Day 161-170 (8 hrs time interval)
Day 171-180 (8 hrs time interval)
I want to achieve the following:
Other Constraints:
Ultimately, my goal is to maintain a local copy of the operators' schedules (as close to realtime as possible) without having the CRON take up too many resources. I also want the CRON to ideally not fail and if it does fail, I want to be notified about it and I want auto retries for a certain number of times. And I want the CRON to be separate from my server and be modular in a sense.
I'm open to both custom solutions or ideally a 3rd party service that does this well but I'm optimizing for simplicity and cost here.
I was thinking of the following approaches (Please correct me if I'm wrong):
The part that fetches schedules from the operator APIs and stores it in one table, say table1
.
The part that parses the data in table1
and reads and writes to my main schedules
table.
Offloading this service off my website's main server should free up resources solving that resource usage problem. However, I'm still left with the problem of not being able to verify if all my updates are actually taking place i.e if schedules are only getting partially updated or fully updated or knowing when and where exactly in my code are failures occurring.
Decided on the following approach:
fetched schedules
in a managed DB on DigitalOcean.fetched_schedules
, reads another table, say live_schedules
from the production DB, parses this data, performs some logical operations and writes updated schedules to the live_schedules
table.fetched_schedules
followed by simultaneous execution of fetching of Day 6-10 schedules and Part2's 1-5 schedule parsing is done so that updates can be finally written to live_schedules
live_schedules
for, failed.