Search code examples
oopdesign-patternstabular

Is there any object-oriented approach or pattern to work with tabular data?


Required operations with tabular data:

  • Switch columns with rows
  • Adding columns/rows at specific places
  • Mark columns/rows/cells as belonging to the same group and fetch specific group
  • Change column/rows order
  • Calculate totals to rows/columns
  • Transform to pivot table
  • Output result as HTML/JSON

What would be a good object-oriented approach to work with tabular data?


Solution

  • I think the first thing that is needed here is a good / appropriate data structure to keep / manipulate the table data. That depends on specific requirements you can have, such as size of the table, performance requirements, etc.

    Let's assume you will use some Matrix class which provides low-level operations over table (set cell value, add/remove row, transpose, etc).

    This class will operate with basic data structures, for example, it may have get_row method which will return a list of numbers. Now we can, for example, get the summary of values in this list, but we can't just change some list item and have this change reflected in the parent Matrix (the row data is disconnected from the parent matrix structure).

    Now we can build our structure upon this Matrix class, our two targets are:

    1) make it more convenient for the user, some of the "conveniences" can be:

    • row, column and cell objects all connected to the parent Table object (if we modify the row item, it will be reflected in the parent table and other row / column / cell obejcts).
    • we can generate different views of the same data (such as pivot tables)
    • we can use special column/row addressing system (such as using 1, 2, 3, ... for rows and A, B, C, ... for columns)

    2) hide the actual data structure we use to store the table data

    • this way we can work on Matrix implementation independently without breaking user code
    • we can even replace it with different structure (maybe we find something that is faster or something that takes less memory), or we can have different implementations for different situations (for example for desktop and mobile apps)

    This is the approximate classes structure I would start with (python-like pseudo code):

    class Matrix:
        """The base data structure, implementation detail."""
    
        get_cell(int x, int y):
            """Returns cell value by x/y column and row indexes."""
    
        set_cell(int x, int y, value):
            """Sets cell value by x/y column and row indexes."""
    
        get_row(int index) -> list:
            """Returns `list` of values in the `index` row."""
    
        get_column(int index) -> list;
         """Returns `list` of values in the `index` column."""
    

    The Matrix class is a low level data structure and it should not be a part of a public interface.

    The public interface is represented by Table class and other related classes below:

    class Table:
        """The user-level interface to work with table data."""
    
        constructor():
            """Initializes Matrix object."""
            # The "_data" object is private, only to be used internally.
            self._data = Matrix()
    
        row(int number) -> Row:
            """Returns `Row` object by row number (1, 2, 3, ...)."""
            row = Row(self, number)
            self.attach(row)
            return row
    
        column(string name) -> Column:
            """Returns `Column` object by string name (A, B, C, ...)."""
            column = Column(self, name)
            self.attach(column)
            return column
    
        cell(int row_number, string col_name) -> Cell:
            """Returns `Cell` object by string name (A, B, C, ...)."""
            cell = Cell(self, row_number, col_name)
            self.attach(cell)
            return column
    
        attach(Observer observer):
            """Register an observer to be notified when Table state was changed."""
            self.observers.append(observer)
    
        _notify():
            """Notify all dependent objects about the state change."""
            for observer in self.observers:
                observer.update()
    
        ...
    

    To keep Table and Row / Column / Cell objects in-sync we can use the Observer pattern.

    Here the Table is a Subject and Row / Column / Cell are Observers. Once the state of the Table (and underlying data) is changed, we can update all dependent objects.

    class Row(Observable):
        """Table row object."""
    
        constructor(Table parent, int index):
            self.parent = parent
            self.index = index
            self._data = None
            self.update()
    
        update()
            """Update row data.
    
            Fetches the `list` or row values from the `Matrix` object.
            """
            # Note: we have two choices here - the `Row`, `Column` and `Cell` objects
            # can either access `Table._data` property directly, or `Table` can provide
            # proxy methods to modify the data (like `_get_value(x, y)`); in both cases
            # there is a private interface to work with data used by `Table`, `Row`,
            # `Column` and `Cell` classes and the implementation depends on the language,
            # in C++ these classes can be friends, in python this can be just a documented
            # agreement on how these classes should work.
            # See also the comment in the `remove` method below.
            self._data = parent._data.get_row(index)
    
        sum():
            """Returns sum of row items."""
            sum = 0
            for value in self._data:
                sum += value
            return sum
    
        cell(string col_name):
            """Returns cell object."""
            return parent.cell(self.index, col_name)
    
        remove():
            """Removes current row."""
            # Here we access `parent._data` directly, so we also have to
            # call `parent._notify` here to update other objects.
            # An alternative would be a set of proxy methods in the `Table` class
            # which would modify the data and then call the `_notify` method, in such case 
            # we would have something like `self.parent._remove_row(self.index)` here.
            self.parent._data.remove_row(self.index)
            self.parent._notify()
            self.parent.detach(self)
    

    The Column and Cell classes are similar, the Column will hold the column data and the Cell will wrap the cell value. The user-level usage can look like be this:

    table = Table()
    # Update table data
    table.cell(1, "A").set(10)
    table.cell(1, "B").set(20)
    table.row(1).cell("C").set(30)
    # Get row sum
    sum = table.row(1).sum()
    
    # Get the table row
    row = table.row(1)
    # The `remove` operation removes the row from the table and `detaches` it,
    # so it will no longer observe the `table` changes.
    row.remove()
    # Now we have the detached row and we can put it into another table,
    # so basically we cut-and-pasted the row from one table to another
    another_table.add_row(row)
    

    Using this approach you can quite easily implement such operations as copy, cut, paste. Also you can apply Command pattern here and extract these operations into small classes. This way it will also be quite easy to implement undo and redo.

    The PivotTable table can also be a special kind of Observable. Depending on the requirements to the features for the pivot table, you may find Builder pattern useful to configure the pivot table. Something like this:

    pivotBuilder = PivotBuilder(table)
    # Group by column "A" and use `SumAggregator` to aggregate grouped values.
    pivotBuilder.group_by_column("A", SumArggregator())  # or maybe pivotBuilder.groupBy(table.column("A"))
    pivotTable := pivotBuilder.get_result()
    

    The classes to export table to different formats probably don't have to be observable, so they'll just wrap the Table object and transform it to the appropriate format:

    json_table = JsonTable(table)
    data = json_table.export()
    

    Of course, the above is just one of many possible implementation options, treat them as some ideas that can be useful (or not useful) depending on specific requirements you have.

    You may find more ideas in the GoF patterns book.