ruby-on-rails ruby ruby-on-rails-4 background-process

Is it wise to store background process status updates in a file?

I'm working on a ruby gem that parses a large number of records. This gem will be used by different frontends one of them being a console ruby script, and another being a rails app that will launch it as a background job.

I'm looking for a way to let the frontends know about the status of the job with a message telling the completed percentage and the name of the operation. For example:

5% Initializing...

I know that delayed_job can take advantage of a gem called progress_job, that will store the progress in activerecord and sidekiq has similar functionality using redis, but this will force my rails app to a specific background job backend, and also won't work for non rails apps.

I was thinking about using a file (maybe json) to store the progress updates, but writing to a file for hundreds times a second for half a hour doesn't seem to be a good practice.

Is there a better way to notify the frontend about progress updates?

UPDATE:

After reading the comments I think that I don't need to update the status so often. Once every 5/6 seconds looks like a good idea.

Solution

Instead of writing an update every 20 lines, write an update every N seconds. After every line, check how much time has elapsed since the last update; if it's greater than N, write an update. If your job runs for 30 minutes, then each 1% increase will take, on average, 18 seconds, so there's probably no need to update the user many times per second.

Since you're going to have two or more output channels—terminal, web—that behave very differently I suggest writing a common interface that each can implement. This way the code that processes the data can just call e.g. output_obj.write without caring what output_obj is.

For your terminal program I suggest looking at how other Unixy command-line tools behave with regard to output. At their most basic they write output to $stdout. Most also accept a filename argument. Some will write status or progress information to $stderr while writing data to $stdout, allowing users to do something like tool in.txt > out.txt and still see progress information while redirecting the data output to a file (or piping it to another tool).

JSON makes sense as a serialization format if your data has any structure to it. If your output is very simple you might consider just printing it in a tabular format, setting $, to ENV['OFS'] (output field separator) or, in its absence, some sane default:

#/usr/bin/env ruby
$, = ENV['OFS'] || "\t"

print 'foo', 'bar', 'baz'

Then:

$ ruby tool.rb
foo     bar     baz

$ export OFS=';'
$ ruby tool.rb
foo;bar;baz

When in doubt, go with established conventions. Be boring with your output; never clever.

For your web front-end it makes less sense to write your updates to the filesystem. Use ActiveRecord or Redis or whatever your app is already using. Then have the browser poll for updates or use websockets or whatever. Do whatever's easiest; optimize/streamline later as the need arises.