Search code examples
perloopmoose

perl class to parse specified csv file types of similar data with different columns


I'm trying to write a Moose class that parses csv files of slightly different formats with headers and return a list of objects representing the data in the files. Here's a simplified version of the code:

package MyParser;

use Moose;
use namespace::autoclean;
use Text::CSV_XS;

use MyData;  #class that represents data for each row of csv

has 'type' => ( is => 'ro', isa => 'Str', required => 1 );

sub get_data {
    my($self, $file) = @_;

    open my $fh, '<', $file || die "Can't open file $!";

    my $csv = Text::CSV_XS->new;
    $csv->column_names($csv->getline($fh));

    my @data;
    if ($self->type eq 'filetype1'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{col1},
                                    field2 => $row->{col2},
                                    field3 => $row->{col3},
                                    );
        }
    }
    elsif ($self->type eq 'filetype2'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{colA},
                                    field3 => _someFunction($row->{colB}), # _someFunction does some manipulation with the data
                                    field5 => $row->{colC},
                                    );
        }
    }
    elsif ($self->type eq 'filetype3'){
        while (my $row = $csv->getline_hr($fh)){
            push @data, MyData->new(field1 => $row->{column_1},
                                    field2 => _someOtherFunction($row->{column_2}),  # _someOtherFunction does some manipulation with the data
                                    field3 => $row->{column_3},
                                    field4 => $row->{column_4},
                                    field5 => $row->{column_5},
                                    );
        }
    }
    close $fh;

    return \@data;
}

__PACKAGE__->meta->make_immutable;

1;

The class MyData is just a simple data structure where some attributes have default attributes (hence the different initializations from above). Some of the csv file types also have columns that require some manipulation (e.g. a number that needs to go into a simple formula) which are file type dependent. This MyData is then returned to my main script to insert into a table in oracle.

My aim is for MyParser to handle certain specified types of csv files that can be extended if needed and return a list of MyData from get_data method. However, the method as it is now doesn't seem like an elegant/simple solution to what I'm trying to solve.

So what I'd like ask/comments on is:

Is there a better/simpler way of solving this (maybe via a design pattern such as the Factory pattern)?
Or am I trying to solve something that looks simple and making things really convoluted?


Solution

  • Instead of the repeated code in the if-elsif-elsif construct it would be cleaner if you put the field mapping rules into a configuration file. For example with a data structure like this:

    {
        filetype1 => {
            field1 => 'col1',
            field2 => 'col2',
            field3 => 'col3',
        },
        filetype2 => {
            field1 => 'colA',
            field3 => {
                function => sub {},
                params   => ['colB'],
            },
            field5 => 'colC',
        },
        filetype3 => {
            field1 => 'column1',
            field2 => {
                function => sub {},
                params   => ['column_2'],
            },
            field3 => 'column_3',
            field4 => 'column_4',
            field5 => 'column_5',
        },
    };
    

    Then you can replace the if-elsif-elsif construct with something like the following (assuming the mapping rules have been loaded and stored in $filetype_mappings):

    while (my $row = $csv->getline_hr($fh)) {
        my %my_data = map {
            my $m = $filetype_mappings->{$_};
            $_ => ( ref $m ? &{$m->{function}}(map {$row->{$_}} @{$m->{params}})
                           : $row->{$m}
            );
        } keys %$filetype_mappings;
        push @data, MyData->new(%my_data);
    }
    

    Having the mapping rules separate should make it easy to add support for new file types or make changes to the existing ones in one place.