I am not too experienced with Perl, but I am trying to achieve something that sounds relatively reasonable and simple.
I want to create a package variable hash that is serialized somewhere and updates once every 24 hours. Basically a cache of data from an external service for the day. I tried the following to test:
our %hashMap;
sub updateMap {
my $mapSize = scalar(keys %hashMap);
if ($mapSize == 0) {
populateMap();
}
return \%hashMap;
}
I added some logging statements and see that every time I call updateMap, the map size is always 0 so it always re-populates the map. The problem is that this is a CGI script so nothing persists.
How can I get the value of the map to stick between function calls and how can I update this map once every 24 hours? One option I have in mind is using Storable store/retrieve to save the hash to a file and retrieve later. Is it possible to check when a file was last modified in Perl to determine if 24 hours have passed?
There are a few questions here, on how to set this up and on update/persistence.
A simple and good way to organize this is to have a module for your "map," with subs that provide access, updating, saving/loading, and whatever else may be useful.
One way to keep data up-to-date is to check every time the user code retrieves the "map" from the module, for instance by checking the timestamp on the file in which data is serialized. (Other ways are mentioned at the end.)
The module
package MapService;
use warnings;
use strict;
use feature qw(say state);
use Data::Dump qw(dd pp);
use Exporter qw(import);
our @EXPORT_OK = qw(get_map force_update save_to_file);
use Storable qw(nstore retrieve); # consider locking versions
my $data_file = 'data.storable';
my %map;
my $_populate_map = sub {
# use "external service" call to populate (update)
state $cnt = 1;
%map = ( a => 1, b => 2, cnt => $cnt++ );
save_to_file();
};
if (-f $data_file) { # initialize
%map = %{ load_from_file($data_file) };
}
else {
$_populate_map->();
save_to_file();
}
my $_update_map = sub {
my $filename = $_[0] // $data_file; #/
if (-M $data_file >= 1) { # one+ day old
$_populate_map->();
save_to_file(file => $filename);
}
};
sub update_map { $_update_map->(@_) }; # outside use, if supported
sub get_map { # use this call to check/update
$_update_map->(@_);
return \%map;
};
sub save_to_file {
my %opts = @_;
my $file = $opts{file} // $data_file;
my $map = $opts{map} // \%map;
nstore $map, $file;
}
sub load_from_file {
my $filename = $_[0] // $data_file;
return retrieve $filename;
}
sub force_update { $_populate_map->() } # for tests
1;
with the test driver
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
use MapService qw(get_map force_update save_to_file);
my $map = get_map();
dd $map;
force_update() for 1..2; # to force more changes in "map"
dd get_map();
save_to_file(); # perhaps in END block
Repeated runs, and examination of the data file, confirm persistence and data manipulations. Little tweaks in the driver help as well, or add routines to change data at will for nicer testing.
Notes
save_to_file
is called often to update timestamp, for checks later in the same run
There's some corner-cutting and silly choices in the module, for brevity
Lexical coderefs ($_populate_map
and $_update_map
) are for module's internal use, not seen from outside,† to which outside access can be given like it is with update_map
Storable
is always a solid choice, but there are other options. A distinct disadvantage of this module is that the data must be both written and read with it (and even module versions shouldn't differ much); advantages are that it takes nearly any valid Perl data and is fast
In particular consider JSON
and YAML
, the widely used formats that work across languages (and are readable). If your data is simple enough I'd definitely recommend these
We are told that this is for a legacy system without many tools, perhaps not even cron
Consider using locks for any work with serialized data here
This is a stub for realistic arrangements, and even as it stands it needs error handling added
The query from the title of this question, about how to run this once every day and keep data, is addressed above in a very simple way. How to actually do it depends on the rest of the project, and there sure are other ways.
The check of whether data needs updating is done as data is pulled from the module by get_map
, so in between such calls we may have missed the need to update; if data is loaded by the caller just once (at start) we never check during the run.
One way around this is to compute the remaining time until update when the program starts, then fork
another process and sleep
in it for that duration, then run the update and send a signal. The main script can then update its "map" data in the signal handler.
Another way would be to set up an event loop for the timer but that is likely an overkill (and which would raise the overall complexity a lot).
† Contrary to the mantra of how "there are no private methods" in Perl, a function given via a lexical (my
) code reference cannot be seen from outside the module, since lexical variables don't exist outside their scope, and thus is truly and fully "private" to the module.
This has serious limitations for systemic use in object oriented design and in that sense there are indeed no (good) private methods, but it is possible to have internal functions, inaccessible from outside, and this is used for restricted (internal) purposes.