If I have this 400M rows table:
create table my_table
( id int,
filled tinyint,
content text not null
) engine=myisam
can I prevent the extra access of content (due to column type text) with:
select id,if(filled,content,'') from my_table
(filled is 0 or 1)?
Or if content can be null, would
select id,ifnull(content,'') from my_table
be better (still in term of performance since the null overhead would then occur)?
Your question is imprecise. Might a row have filled = 1
and content = 'some text'
?
If yes, then the IF(filled, content, '')
makes sense.
If no, then get rid of filled
and simply set content
to ''
. The cost of accessing an empty string for content
is tiny.
How big is content
typically? If is usually only hundreds of characters, the performance difference is small.
Don't use MyISAM, switch to InnoDB.
Do you have a lot of DELETEs
or UPDATEs
? If so, the .MYD file of MyISAM may be fragmented. This implies that content
may be scattered around the disk. If you have no updates or deletes, then content
is "right there", hence virtually free to access.
In contrast, InnoDB sometimes puts TEXT
and BLOB
"off-record".
But, which is better or worse? The off-record (in some situations) versus the fragmentation (in some situations).
Are you fetching just one row? Or lots of such rows? This is another area where MyISAM and InnoDB are implemented quite differently. MyISAM needs to step over content
to get to the "next" row. (I won't go into a lengthy discussion of where the "next" row is likely to be.)
Bottom line: Don't worry about optimizing, but do change to InnoDB (for a lot of reasons).
Where is the "next" row likely to be?
MyISAM... Rows are added to the .MYD in the order of inserts. But, after updates or deletes, new/changed rows tend to fill in the gaps first, leaving appending to the file as a last resort.
InnoDB... Rows are in a B+Tree ordered by the PRIMARY KEY
. The BTree is blocked with 16KB blocks. You have a TEXT
column, so it gets more complex...
For "small" TEXT columns, several rows will fit into 16KB. If, say, content
is 1KB strings, up to about 15 rows will be put in a block. If they are much bigger, then content
will be stored "off-record".
The rows in a block are essentially all equidistant from each other. (The big effort is loading the block into cache and locating the block.)
The "next" (or "previous") block is just a disk access away -- see the "+" in "B+Tree".
When a block overflows, it is "split" into two blocks, some rows go into each. The new block is allocated somewhere, rarely adjacent.
With SSDs, all blocks are "equidistant" from each other. With HDD, rotational delays and arm motion complicate the timing of which block might be "closer" than some other block.