What is the "proper" way to bundle a required-at-runtime data file with a Perl module, such that the module can read its contents before being used?
A simple example would be this Dictionary module, which needs to read a list of (word,definition) pairs at startup.
package Reference::Dictionary;
# TODO: This is the Dictionary, which needs to be populated from
# data-file BEFORE calling Lookup!
our %Dictionary;
sub new {
my $class = shift;
return bless {}, $class;
}
sub Lookup {
my ($self,$word) = @_;
return $Dictionary{$word};
}
1;
and a driver program, Main.pl:
use Reference::Dictionary;
my $dictionary = new Reference::Dictionary;
print $dictionary->Lookup("aardvark");
Now, my directory structure looks like this:
root/
Main.pl
Reference/
Dictionary.pm
Dictionary.txt
I can't seem to get Dictionary.pm to load Dictionary.txt at startup. I've tried a few methods to get this to work, such as...
Using BEGIN block:
BEGIN {
open(FP, '<', 'Dictionary.txt') or die "Can't open: $!\n";
while (<FP>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close(FP);
}
No dice: Perl is looking in cwd for Dictionary.txt, which is the path of the main script ("Main.pl"), not the path of the module, so this gives File Not Found.
Using DATA:
BEGIN {
while (<DATA>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close(DATA);
}
and at end of module
__DATA__
aardvark,an animal which is definitely not an anteater
abacus,an oldschool calculator
...
This too fails because BEGIN executes at compile-time, before DATA is available.
Hard-code the data in the module
our %Dictionary = (
aardvark => 'an animal which is definitely not an anteater',
abacus => 'an oldschool calculator'
...
);
Works, but is decidedly non-maintainable.
Similar question here: How should I distribute data files with Perl modules? but that one deals with modules installed by CPAN, not modules relative to the current script as I'm attempting to do.
There's no need to load the dictionary at BEGIN
time. BEGIN
time is relative to the file being loaded. When your main.pl
says use Dictionary
, all the code in Dictionary.pm is compiled and loaded. Put the code to load it early in Dictionary.pm.
package Dictionary;
use strict;
use warnings;
my %Dictionary; # There is no need for a global
while (<DATA>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
You can also load from Dictionary.txt
located in the same directory. The trick is you have to provide an absolute path to the file. You can get this from __FILE__
which is the path to the current file (ie. Dictionary.pm
).
use File::Basename;
# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);
open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
my %Dictionary;
while (<$fh>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close($fh);
Which should you use? DATA
is easier to distribute. A separate, parallel file is easier for non-coders to work on.
Better than loading the whole dictionary when the library is loaded, it is more polite to wait to load it when it's needed.
use File::Basename;
# Load the dictionary from Dictionary.txt
sub _load_dictionary {
my %dictionary;
# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);
open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
while (<$fh>) {
chomp;
my ($word, $def) = split(/,/);
$dictionary{$word} = $def;
}
return \%dictionary;
}
# Get the possibly cached dictionary
my $Dictionary;
sub _get_dictionary {
return $Dictionary ||= _load_dictionary;
}
sub new {
my $class = shift;
my $self = bless {}, $class;
$self->{dictionary} = $self->_get_dictionary;
return $self;
}
sub lookup {
my $self = shift;
my $word = shift;
return $self->{dictionary}{$word};
}
Each object now contains a reference to the shared dictionary (eliminating the need for a global) which is only loaded when an object is created.