Search code examples
pythonpytables

Pytables EArray vs Table for Speed/Efficiency


I'm trying to figure out what is the most efficient way to store time-value pairs in pytables. I'm using pytables since I'm dealing with huge ammounts of data. I will need to perform calculations on the data (average, interpolate, etc.). I don't know the number of rows ahead of time.

I know that an EArray can be appended to, much like a Table. Is there a reason to chose one over the other?

Given my simple data structure (homogeneous time-value pairs) i figured an EArray would be faster/most efficient, but the following quote from the pytables creator himself threw me off:

"...PyTables is specially tuned for, well, tables.
And these entities wear special I/O buffers and query engines that are fined tuned for maximum speed. *Array objects do not wear the same machinery."quote location


Solution

  • If the columns have some particular meaning or name, then you should definitely use a Table.

    The efficiency largely depends on what kinds of operations you are doing on the data. Most of the time there won't be much of a difference. EArray might be faster for row-access, Tables are probably slightly better at column access, and they should be very similar for whole Table/EArray access.

    Of course, the moment you want to do something more than simply access element and instead want to query or transform the data, you should use a Table. Tables are really built up around this idea of querying, via where() methods, and indexing, which makes such operations very fast. EArrays lack this infrastructure and are therefore slower.