How are ABAP data clusters stored in the database?

It's possible to store data clusters inside the database using import and export statements, together with a dictionary table which adheres to a template (at least has the fields MANDT, RELID, SRTFD, SRTF2, CLUSTR, CLUSTD).

Here's two example statements which store/retrieve an entire internal table ta_test as a data cluster with name testtab and id TEST in the database, using dictionary table ztest, and area AA

export testtab = ta_test to database ztest(AA) id 'TEST'.
import testtab = ta_test from database ztest(AA) id 'TEST'.

Looking at the contents of the ztest table, I see the following records (first 4 fields are the primary key):

MANDT   200
RELID   AA
SRTFD   TEST
SRTF2   0 (auto-incremented for each record)
CLUSTR  integer value with a maximum of 2.886
CLUSTD  a 128 character hexadecimal string

I've also noticed that the amount of data stored this way is a lot less than the data which was inside the internal table (for instance, 1.000.000 unique records in the internal table result in only 1.703 records inside the ztest table). Setting compression off on the export statement does increase the amount of records, but it's still a lot less.

My questions: does anyone know how this works exactly? Is the actual data stored elsewhere and does ztest contain pointers to it? Compression? Encryption? Is the actual data accessible directly from the database (skipping the ABAP layer)?

Solution

The internal format of data clusters is not documented (at least not in an official documentation). From my experience, it does contain the entire data and not just pointers: Transporting the table entries to a different system -- as you do frequently when transporting ALV list layouts -- is sufficient to move the contents over. Moreover, the binary blob does not seem to contain much information about the data structure - if you change the source/target structure in an incompatible way, you risk losing data. Direct access from the database layer is not possible (this is actually stated in numerous places all over the documentation). It might be possible to reverse-engineer the marshalling/unmarshalling algorithm, but why bother when you've got the language statement to access the contents at hand?