I'm trying to write a Moose class that parses csv files of slightly different formats with headers and return a list of objects representing the data in the files. Here's a simplified version of the code:
package MyParser;
use Moose;
use namespace::autoclean;
use Text::CSV_XS;
use MyData; #class that represents data for each row of csv
has 'type' => ( is => 'ro', isa => 'Str', required => 1 );
sub get_data {
my($self, $file) = @_;
open my $fh, '<', $file || die "Can't open file $!";
my $csv = Text::CSV_XS->new;
$csv->column_names($csv->getline($fh));
my @data;
if ($self->type eq 'filetype1'){
while (my $row = $csv->getline_hr($fh)){
push @data, MyData->new(field1 => $row->{col1},
field2 => $row->{col2},
field3 => $row->{col3},
);
}
}
elsif ($self->type eq 'filetype2'){
while (my $row = $csv->getline_hr($fh)){
push @data, MyData->new(field1 => $row->{colA},
field3 => _someFunction($row->{colB}), # _someFunction does some manipulation with the data
field5 => $row->{colC},
);
}
}
elsif ($self->type eq 'filetype3'){
while (my $row = $csv->getline_hr($fh)){
push @data, MyData->new(field1 => $row->{column_1},
field2 => _someOtherFunction($row->{column_2}), # _someOtherFunction does some manipulation with the data
field3 => $row->{column_3},
field4 => $row->{column_4},
field5 => $row->{column_5},
);
}
}
close $fh;
return \@data;
}
__PACKAGE__->meta->make_immutable;
1;
The class MyData is just a simple data structure where some attributes have default attributes (hence the different initializations from above). Some of the csv file types also have columns that require some manipulation (e.g. a number that needs to go into a simple formula) which are file type dependent. This MyData is then returned to my main script to insert into a table in oracle.
My aim is for MyParser to handle certain specified types of csv files that can be extended if needed and return a list of MyData from get_data method. However, the method as it is now doesn't seem like an elegant/simple solution to what I'm trying to solve.
So what I'd like ask/comments on is:
Is there a better/simpler way of solving this (maybe via a design pattern such as the Factory pattern)?
Or am I trying to solve something that looks simple and making things really convoluted?
Instead of the repeated code in the if-elsif-elsif construct it would be cleaner if you put the field mapping rules into a configuration file. For example with a data structure like this:
{
filetype1 => {
field1 => 'col1',
field2 => 'col2',
field3 => 'col3',
},
filetype2 => {
field1 => 'colA',
field3 => {
function => sub {},
params => ['colB'],
},
field5 => 'colC',
},
filetype3 => {
field1 => 'column1',
field2 => {
function => sub {},
params => ['column_2'],
},
field3 => 'column_3',
field4 => 'column_4',
field5 => 'column_5',
},
};
Then you can replace the if-elsif-elsif construct with something like the following
(assuming the mapping rules have been loaded and stored in $filetype_mappings
):
while (my $row = $csv->getline_hr($fh)) {
my %my_data = map {
my $m = $filetype_mappings->{$_};
$_ => ( ref $m ? &{$m->{function}}(map {$row->{$_}} @{$m->{params}})
: $row->{$m}
);
} keys %$filetype_mappings;
push @data, MyData->new(%my_data);
}
Having the mapping rules separate should make it easy to add support for new file types or make changes to the existing ones in one place.