Search code examples
perlcpanperl-moduleperl-data-structuresperl-hash

Perl multi-dimensional table with headers


I'm trying to implement a multi-dimensional table with headers.

Here's an example for 2D:

                       < dimension1 >
    /\               'column0'  'column1'
dimension0   'row0'   data00     data10
    \/       'row1'   data01     data11

The headers for rows and columns are text, and the data is anything. I want to be able to do something like this (syntax can be different, I'm beginner in Perl):

my $table = new table(2); # 2 is the number of dimensions

# the following line creates a new row/column if it didn't exist previously
$table['row0']['column0'] = data00;
$table['row0']['column1'] = data01;
$table['row1']['column0'] = data10;
$table['row1']['column1'] = data11;

# the following line returns the headers of the specified dimension
$table->headers(0);
 => ('row0', 'row1')

First question: Is there something like this already done in CPAN? (before you ask I did search for a significant amount of time and I didn't find anything like it)


Second question: Here's my try, I know it's ugly and probably wrong. Any Perl expert out there care to review my code?

package table;

sub new {
  my $class = shift;
  my $dimensions = shift;
  my $self = bless({}, $class);
  $self->{dimensions} = $dimensions;
  $self->{data} = [];
  $self->{headers} = [];
  return $self;
}

sub get_dimensions {
  my $self = shift;
  return $self->{dimensions};
}

# This function creates a header or return its index if it already existed.
# Headers are encoded as an array of hashes so that this is O(1) amortized.

sub header {
  my $self = shift;
  my $dimension = shift;
  my $header = shift;
  my $headers = $self->{headers}[$dimension];
  if(!defined($headers)) {
    $headers = $self->{headers}[$dimension] = {};
  }
  if(!defined($headers->{$header})) {
    $headers->{$header} = scalar keys %$headers;
  }
  return $headers->{$header};
}

# This function returns the list of headers. Because the headers are
# stored as a hash (`header=>index`), I need to retrieve the keys
# and sort them by value.

sub get_headers {
  my $self = shift;
  my $dimension = shift;
  my $headers = $self->{headers}[$dimension];
  return [sort { $headers->{$a} cmp $headers->{$b} } keys %$headers];
}

# This last function stores/retrieves data from the table.

sub data {
  my $self = shift;
  my $data = $self->{data};
  my $dimensions = $self->{dimensions};
  for(my $i = 0; $i < $dimensions-1; ++$i) {
    my $index = $self->header($i, shift);
    if(!defined($data->[$index])) {
      $data->[$index] = [];
    }
    $data = $data->[$index];
  }
  my $index = $self->header($dimensions-1, shift);
  my $value = shift;
  if(defined($value)) {
    $data->[$index] = $value;
  }
  return $data->[$index];
}

Solution

  • You want a structure for an "N" dimensional table. I doubt there's a CPAN module that can do this because it's just not that common a situation.

    The problem is that the data structure grows quite rapidly and so does the complexity.

    You can store an N dimensional table in a single list by using a bit of mathematics to transform the N dimensional array into a single dimension. Let's say that X represents the X dimension and X' represents the length of that dimension. For a two dimensional table, you could get the value by doing:

    X * Y` + Y.
    

    For a 3 dimensional table X, Y, Z, the answer would be:

    X * (Y' * Z') + Y * Z' + Z
    

    For a 4 dimensional table W, X, Y, Z, the answer would be:

    W * (X' * Y' * Z') + X * (Y' + Z') + Y * Z' + Z'
    

    (I hope the math is right).

    Therefore, I can imagine a structure like this for an N dimensional table. It would involve two different classes: One represents the dimensional information and the other represents the actual data (including all of the dimensions).

    • Dimension (Class)
      • Heading (alphanumeric string)
      • Size of Dimension (integer)
    • N-Table (Class)
      • Array of Dimension (Dimension Class Objects)
      • Array of Data (Alphanumeric strings)

    You can get the number of dimensions by looking at:

    my $numOfDimensions = scalar @{$ntable->{DIMENSIONS}};
    

    And, you can get the heading of dimension $x by looking at:

    my xDimensionHeading = $ntable->{DIMENSION}->[$x]->{HEADING};
    

    And, the size of that dimension by looking at:

    my xDimensionSize = $ntable->{DIMENSION}->[$x]->{SIZE};
    

    Of course, you'd do this with true object oriented calls, and not bare references, but this gives you an idea how the structure would work.

    Now, you need a way of transforming a list of integers that would represent a cell's location into a cell's location along a single dimensional array, and you'll have a way of getting and retrieving your data.

    Would this be what you're looking for?


    EDIT

    Close to it, but I actually resize the table dimensions a lot (I can't determine their size in advance) and if I understood your solution doesn't accomodate for this.

    This adds a lot of complication...

    We need to throw out the Size in the Dimension class. And, we can't use a single dimensional array to store our data.

    I hope you don't change the table dimensionality.

    We could do something like this:

    • N-Table (Class)
      • List of Dimension Headings {DIMENSION}->[]
      • List to Data {DATA}->[] (This could be a link to other lists)

    The {DATA} list is a link of lists depending on the depth of the table. For example:

     my data_3D = $table_3D->{DATA}->[$x]->[$y]->[$z];
     my data_2D = $table_2D->{DATA}->[$x]->[$y];
    

    The number of dimensions is scalar @{$table->{DIMENSION}}.

    The question is how do I access the data in a way that's dimensional neutral. I could require 2, 3, 4, or more dimensions, and I have to have someway of structuring my address to pull it out.

    We could have some sort of looping mechanism. We get a list of coordinates in @coordinates, and then look at each coordinate. The last will point to data. The rest will simply be another reference to another array.

     my $data = pop @coordinates;    #First Coordinate
     $data = $table->[$data];        #Could be data if 1D table, could be a reference
     foreach my $coordinate (@coordinates) {
        die qq(Not enough coordinates) if ref $data ne 'ARRAY';
        $data = $data->[$coordinate];   #Could be data, could be a reference
     }
    
     # Cell value is in $data
    

    It also may be possible to build a list of coordinates, and then evaluating it. Again completely untested:

     $coordinates = "[" . join ("]->[" => @coordinates . "]";
    

    If there were three coordinates, this would be

     $coordinates = "[$x]->[$y]->[$z]";
    

    I'm not sure how a 1 dimensional array would work...

    From there, you could build a statement and use eval on it and get the data.

    You'll have to have several methods.

    • Set the dimensions
    • Set a cell
    • Retrieve a cell
    • Verify table is completeness (I have no idea how this would work.

    This is more a brain dump, but it I think this might work. You don't have any set table dimensions and it might work for any N-dimensional table.