Search code examples
sqliterowid

INTEGER PRIMARY KEY vs rowid in SQLite


I am trying to import some spatial data (OSM) into an SQLite database. The SQLite reference states that an INTEGER PRIMARY KEY becomes an alias for the rowid (if WITHOUT ROWID is not specified). Just to be sure, I created my main table in two different manners:

CREATE TABLE points (tags BLOB NOT NULL,
                     lon INTEGER NOT NULL,
                     lat INTEGER NOT NULL)

vs.

CREATE TABLE points (id INTEGER PRIMARY KEY,
                     tags BLOB NOT NULL,
                     lon INTEGER NOT NULL,
                     lat INTEGER NOT NULL)

I expected the same results, but after running the application twice, my two database files clearly differ in size: The version with explicit primary key takes about 100 MB more of disk space (1.5 GB vs 1.4 GB). My insert statements are equal apart from the fact that one uses "id", the other one "rowid" as destination column for the point ID.

Does anyone have a clue where this massive difference in size comes from? Thanks in advance.


Solution

  • It would appear that there is an overhead of having an alias for the rowid of a byte (I think) per row, which I believe is explained by :-

    When an SQL table includes an INTEGER PRIMARY KEY column (which aliases the rowid) then that column appears in the record as a NULL value. SQLite will always use the table b-tree key rather than the NULL value when referencing the INTEGER PRIMARY KEY column. Database File Format - 2.3. Representation Of SQL Tables.

    The 1 byte per row appears to be pretty close according to the following testing:-

    Two databases were created with the two differing tables, loaded with 1,000,000 million rows using the following SQL :-

    For the First :-

    DROP TABLE IF EXISTS points;
    CREATE TABLE IF NOT EXISTS points (tags BLOB NOT NULL, lon INTEGER NOT NULL, lat INTEGER NOT NULL);
    WITH RECURSIVE counter(tags,lon,lat) AS (SELECT x'00000000', 0,0 UNION ALL SELECT tags, random() AS lon, random() AS lat FROM counter LIMIT 1000000)
    INSERT INTO points (tags,lon,lat) SELECT * FROM counter;
    SELECT * FROM points;
    VACUUM
    

    For the Second (with an alias of the rowid):-

    DROP TABLE IF EXISTS points;
    CREATE TABLE IF NOT EXISTS points (id INTEGER PRIMARY KEY, tags BLOB NOT NULL, lon INTEGER NOT NULL, lat INTEGER NOT NULL);
    WITH RECURSIVE counter(tags,lon,lat) AS (SELECT x'00000000', 0,0 UNION ALL SELECT tags, random() AS lon, random() AS lat FROM counter LIMIT 1000000)
    INSERT INTO points (tags,lon,lat) SELECT * FROM counter;
    SELECT * FROM points;
    VACUUM
    

    The the resultant file sizes were 29484Kb and 30600Kb respectively.

    That being a difference of 30600 - 29484 = 1,116, multiply this by 1024 = 1142784 (not that far off the 1,000,000 rows, pages and freespace probably accounting for the discrepancy ).

    • Note the VACUUM command made no difference (as they were new tables there was no expectation that they would.)