Search code examples
perlserializationhashpersistence

Serialize a Perl package variable and update once every 24 hours from a CGI script


I am not too experienced with Perl, but I am trying to achieve something that sounds relatively reasonable and simple.

I want to create a package variable hash that is serialized somewhere and updates once every 24 hours. Basically a cache of data from an external service for the day. I tried the following to test:

our %hashMap;

sub updateMap {
    my $mapSize = scalar(keys %hashMap);
    if ($mapSize == 0) {
        populateMap();
    }

    return \%hashMap;
}

I added some logging statements and see that every time I call updateMap, the map size is always 0 so it always re-populates the map. The problem is that this is a CGI script so nothing persists.

How can I get the value of the map to stick between function calls and how can I update this map once every 24 hours? One option I have in mind is using Storable store/retrieve to save the hash to a file and retrieve later. Is it possible to check when a file was last modified in Perl to determine if 24 hours have passed?


Solution

  • There are a few questions here, on how to set this up and on update/persistence.

    A simple and good way to organize this is to have a module for your "map," with subs that provide access, updating, saving/loading, and whatever else may be useful.

    One way to keep data up-to-date is to check every time the user code retrieves the "map" from the module, for instance by checking the timestamp on the file in which data is serialized. (Other ways are mentioned at the end.)

    The module

    package MapService;
    
    use warnings;
    use strict;
    use feature qw(say state);
    use Data::Dump qw(dd pp);
    
    use Exporter qw(import);
    our @EXPORT_OK = qw(get_map force_update save_to_file);
    
    use Storable qw(nstore retrieve);  # consider locking versions
    
    my $data_file = 'data.storable';
        
    my %map;
    
    my $_populate_map = sub { 
        # use "external service" call to populate (update)
        state $cnt = 1;
        %map = ( a => 1, b => 2, cnt => $cnt++ );
        save_to_file();
    };
    
    if (-f $data_file) {                         # initialize
        %map = %{ load_from_file($data_file) };
    }
    else {
        $_populate_map->();
        save_to_file();
    }
    
    my $_update_map = sub {
        my $filename = $_[0] // $data_file;     #/
        if (-M $data_file >= 1)  {              # one+ day old
            $_populate_map->();
            save_to_file(file => $filename);
        }   
    };
    
    sub update_map { $_update_map->(@_) };  # outside use, if supported
    
    sub get_map {                           # use this call to check/update
        $_update_map->(@_);
        return \%map;
    };
    
    sub save_to_file {
        my %opts = @_; 
        my $file = $opts{file} // $data_file;
        my $map  = $opts{map}  // \%map;
        nstore $map, $file;
    }
    
    sub load_from_file {
        my $filename = $_[0] // $data_file;
        return retrieve $filename;
    }
        
    sub force_update { $_populate_map->() }   # for tests
    
    1;
    

    with the test driver

    use warnings;
    use strict;
    use feature 'say'; 
    use Data::Dump qw(dd);
    
    use MapService qw(get_map force_update save_to_file);
    
    my $map = get_map();
    dd $map;
    
    force_update() for 1..2;   # to force more changes in "map"
    dd get_map();
    
    save_to_file();  # perhaps in END block
    

    Repeated runs, and examination of the data file, confirm persistence and data manipulations. Little tweaks in the driver help as well, or add routines to change data at will for nicer testing.

    Notes

    • save_to_file is called often to update timestamp, for checks later in the same run

    • There's some corner-cutting and silly choices in the module, for brevity

    • Lexical coderefs ($_populate_map and $_update_map) are for module's internal use, not seen from outside, to which outside access can be given like it is with update_map

    • Storable is always a solid choice, but there are other options. A distinct disadvantage of this module is that the data must be both written and read with it (and even module versions shouldn't differ much); advantages are that it takes nearly any valid Perl data and is fast

      In particular consider JSON and YAML, the widely used formats that work across languages (and are readable). If your data is simple enough I'd definitely recommend these

    • We are told that this is for a legacy system without many tools, perhaps not even cron

    • Consider using locks for any work with serialized data here

    • This is a stub for realistic arrangements, and even as it stands it needs error handling added

    The query from the title of this question, about how to run this once every day and keep data, is addressed above in a very simple way. How to actually do it depends on the rest of the project, and there sure are other ways.

    The check of whether data needs updating is done as data is pulled from the module by get_map, so in between such calls we may have missed the need to update; if data is loaded by the caller just once (at start) we never check during the run.

    One way around this is to compute the remaining time until update when the program starts, then fork another process and sleep in it for that duration, then run the update and send a signal. The main script can then update its "map" data in the signal handler.

    Another way would be to set up an event loop for the timer but that is likely an overkill (and which would raise the overall complexity a lot).


     Contrary to the mantra of how "there are no private methods" in Perl, a function given via a lexical (my) code reference cannot be seen from outside the module, since lexical variables don't exist outside their scope, and thus is truly and fully "private" to the module.

    This has serious limitations for systemic use in object oriented design and in that sense there are indeed no (good) private methods, but it is possible to have internal functions, inaccessible from outside, and this is used for restricted (internal) purposes.