Search code examples
perlperl-moduledirectory-structuredata-files

How to include a data file with a Perl module?


What is the "proper" way to bundle a required-at-runtime data file with a Perl module, such that the module can read its contents before being used?

A simple example would be this Dictionary module, which needs to read a list of (word,definition) pairs at startup.

package Reference::Dictionary;

# TODO: This is the Dictionary, which needs to be populated from
#  data-file BEFORE calling Lookup!
our %Dictionary;

sub new {
  my $class = shift;
  return bless {}, $class;
}

sub Lookup {
  my ($self,$word) = @_;
  return $Dictionary{$word};
}
1;

and a driver program, Main.pl:

use Reference::Dictionary;

my $dictionary = new Reference::Dictionary;
print $dictionary->Lookup("aardvark");

Now, my directory structure looks like this:

root/
  Main.pl
  Reference/
    Dictionary.pm
    Dictionary.txt

I can't seem to get Dictionary.pm to load Dictionary.txt at startup. I've tried a few methods to get this to work, such as...

  • Using BEGIN block:

    BEGIN {
      open(FP, '<', 'Dictionary.txt') or die "Can't open: $!\n";
      while (<FP>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
      }
      close(FP);
    }
    

    No dice: Perl is looking in cwd for Dictionary.txt, which is the path of the main script ("Main.pl"), not the path of the module, so this gives File Not Found.

  • Using DATA:

    BEGIN {
      while (<DATA>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
      }
      close(DATA);
    }
    

    and at end of module

    __DATA__
    aardvark,an animal which is definitely not an anteater
    abacus,an oldschool calculator
    ...
    

    This too fails because BEGIN executes at compile-time, before DATA is available.

  • Hard-code the data in the module

    our %Dictionary = (
      aardvark => 'an animal which is definitely not an anteater',
      abacus => 'an oldschool calculator'
      ...
    );
    

    Works, but is decidedly non-maintainable.

Similar question here: How should I distribute data files with Perl modules? but that one deals with modules installed by CPAN, not modules relative to the current script as I'm attempting to do.


Solution

  • There's no need to load the dictionary at BEGIN time. BEGIN time is relative to the file being loaded. When your main.pl says use Dictionary, all the code in Dictionary.pm is compiled and loaded. Put the code to load it early in Dictionary.pm.

    package Dictionary;
    
    use strict;
    use warnings;
    
    my %Dictionary;  # There is no need for a global
    while (<DATA>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
    }
    

    You can also load from Dictionary.txt located in the same directory. The trick is you have to provide an absolute path to the file. You can get this from __FILE__ which is the path to the current file (ie. Dictionary.pm).

    use File::Basename;
    
    # Get the directory Dictionary.pm is located in.
    my $dir = dirname(__FILE__);
    
    open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
    
    my %Dictionary;
    while (<$fh>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
    }
    close($fh);
    

    Which should you use? DATA is easier to distribute. A separate, parallel file is easier for non-coders to work on.


    Better than loading the whole dictionary when the library is loaded, it is more polite to wait to load it when it's needed.

    use File::Basename;
    
    # Load the dictionary from Dictionary.txt
    sub _load_dictionary {
        my %dictionary;
    
        # Get the directory Dictionary.pm is located in.
        my $dir = dirname(__FILE__);
    
        open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
    
        while (<$fh>) {
            chomp;
            my ($word, $def) = split(/,/);
            $dictionary{$word} = $def;
        }
    
        return \%dictionary;
    }
    
    # Get the possibly cached dictionary
    my $Dictionary;
    sub _get_dictionary {
        return $Dictionary ||= _load_dictionary;
    }
    
    sub new {
        my $class = shift;
    
        my $self = bless {}, $class;
        $self->{dictionary} = $self->_get_dictionary;
    
        return $self;
    }
    
    sub lookup {
        my $self = shift;
        my $word = shift;
    
        return $self->{dictionary}{$word};
    }
    

    Each object now contains a reference to the shared dictionary (eliminating the need for a global) which is only loaded when an object is created.