Required operations with tabular data:
What would be a good object-oriented approach to work with tabular data?
I think the first thing that is needed here is a good / appropriate data structure to keep / manipulate the table data. That depends on specific requirements you can have, such as size of the table, performance requirements, etc.
Let's assume you will use some Matrix
class which provides low-level operations over table (set cell value, add/remove row, transpose, etc).
This class will operate with basic data structures, for example, it may have get_row
method which will return a list
of numbers.
Now we can, for example, get the summary of values in this list, but we can't just change some list item and have this change reflected in the parent Matrix
(the row data is disconnected from the parent matrix structure).
Now we can build our structure upon this Matrix
class, our two targets are:
1) make it more convenient for the user, some of the "conveniences" can be:
Table
object (if we modify the row item, it will be reflected in the parent table and other row / column / cell obejcts).2) hide the actual data structure we use to store the table data
Matrix
implementation independently without breaking user codeThis is the approximate classes structure I would start with (python-like pseudo code):
class Matrix:
"""The base data structure, implementation detail."""
get_cell(int x, int y):
"""Returns cell value by x/y column and row indexes."""
set_cell(int x, int y, value):
"""Sets cell value by x/y column and row indexes."""
get_row(int index) -> list:
"""Returns `list` of values in the `index` row."""
get_column(int index) -> list;
"""Returns `list` of values in the `index` column."""
The Matrix
class is a low level data structure and it should not be a part of a public interface.
The public interface is represented by Table
class and other related classes below:
class Table:
"""The user-level interface to work with table data."""
constructor():
"""Initializes Matrix object."""
# The "_data" object is private, only to be used internally.
self._data = Matrix()
row(int number) -> Row:
"""Returns `Row` object by row number (1, 2, 3, ...)."""
row = Row(self, number)
self.attach(row)
return row
column(string name) -> Column:
"""Returns `Column` object by string name (A, B, C, ...)."""
column = Column(self, name)
self.attach(column)
return column
cell(int row_number, string col_name) -> Cell:
"""Returns `Cell` object by string name (A, B, C, ...)."""
cell = Cell(self, row_number, col_name)
self.attach(cell)
return column
attach(Observer observer):
"""Register an observer to be notified when Table state was changed."""
self.observers.append(observer)
_notify():
"""Notify all dependent objects about the state change."""
for observer in self.observers:
observer.update()
...
To keep Table
and Row /
Column /
Cell
objects in-sync we can use the Observer pattern.
Here the Table
is a Subject
and Row
/ Column
/ Cell
are Observers
.
Once the state of the Table
(and underlying data) is changed, we can update all dependent objects.
class Row(Observable):
"""Table row object."""
constructor(Table parent, int index):
self.parent = parent
self.index = index
self._data = None
self.update()
update()
"""Update row data.
Fetches the `list` or row values from the `Matrix` object.
"""
# Note: we have two choices here - the `Row`, `Column` and `Cell` objects
# can either access `Table._data` property directly, or `Table` can provide
# proxy methods to modify the data (like `_get_value(x, y)`); in both cases
# there is a private interface to work with data used by `Table`, `Row`,
# `Column` and `Cell` classes and the implementation depends on the language,
# in C++ these classes can be friends, in python this can be just a documented
# agreement on how these classes should work.
# See also the comment in the `remove` method below.
self._data = parent._data.get_row(index)
sum():
"""Returns sum of row items."""
sum = 0
for value in self._data:
sum += value
return sum
cell(string col_name):
"""Returns cell object."""
return parent.cell(self.index, col_name)
remove():
"""Removes current row."""
# Here we access `parent._data` directly, so we also have to
# call `parent._notify` here to update other objects.
# An alternative would be a set of proxy methods in the `Table` class
# which would modify the data and then call the `_notify` method, in such case
# we would have something like `self.parent._remove_row(self.index)` here.
self.parent._data.remove_row(self.index)
self.parent._notify()
self.parent.detach(self)
The Column
and Cell
classes are similar, the Column
will hold the column data and the Cell
will wrap the cell value.
The user-level usage can look like be this:
table = Table()
# Update table data
table.cell(1, "A").set(10)
table.cell(1, "B").set(20)
table.row(1).cell("C").set(30)
# Get row sum
sum = table.row(1).sum()
# Get the table row
row = table.row(1)
# The `remove` operation removes the row from the table and `detaches` it,
# so it will no longer observe the `table` changes.
row.remove()
# Now we have the detached row and we can put it into another table,
# so basically we cut-and-pasted the row from one table to another
another_table.add_row(row)
Using this approach you can quite easily implement such operations as copy, cut, paste. Also you can apply Command pattern here and extract these operations into small classes. This way it will also be quite easy to implement undo and redo.
The PivotTable
table can also be a special kind of Observable
.
Depending on the requirements to the features for the pivot table, you may find Builder pattern useful to configure the pivot table.
Something like this:
pivotBuilder = PivotBuilder(table)
# Group by column "A" and use `SumAggregator` to aggregate grouped values.
pivotBuilder.group_by_column("A", SumArggregator()) # or maybe pivotBuilder.groupBy(table.column("A"))
pivotTable := pivotBuilder.get_result()
The classes to export table to different formats probably don't have to be observable, so they'll just wrap the Table
object and transform it to the appropriate format:
json_table = JsonTable(table)
data = json_table.export()
Of course, the above is just one of many possible implementation options, treat them as some ideas that can be useful (or not useful) depending on specific requirements you have.
You may find more ideas in the GoF patterns book.