Search code examples
sqlperformance

select * vs select column


If I just need 2/3 columns and I query SELECT * instead of providing those columns in select query, is there any performance degradation regarding more/less I/O or memory?

The network overhead might be present if I do select * without a need.

But in a select operation, does the database engine always pull atomic tuple from the disk, or does it pull only those columns requested in the select operation?

If it always pulls a tuple then I/O overhead is the same.

At the same time, there might be a memory consumption for stripping out the requested columns from the tuple, if it pulls a tuple.

So if that's the case, select someColumn will have more memory overhead than that of select *


Solution

  • It always pulls a tuple (except in cases where the table has been vertically segmented - broken up into columns pieces), so, to answer the question you asked, it doesn't matter from a performance perspective. However, for many other reasons, (below) you should always select specifically those columns you want, by name.

    It always pulls a tuple, because (in every vendors RDBMS I am familiar with), the underlying on-disk storage structure for everything (including table data) is based on defined I/O Pages (in SQL Server for e.g., each Page is 8 kilobytes). And every I/O read or write is by Page.. I.e., every write or read is a complete Page of data.

    Because of this underlying structural constraint, a consequence is that Each row of data in a database must always be on one and only one page. It cannot span multiple Pages of data (except for special things like blobs, where the actual blob data is stored in separate Page-chunks, and the actual table row column then only gets a pointer...). But these exceptions are just that, exceptions, and generally do not apply except in special cases ( for special types of data, or certain optimizations for special circumstances)
    Even in these special cases, generally, the actual table row of data itself (which contains the pointer to the actual data for the Blob, or whatever), it must be stored on a single IO Page...

    EXCEPTION. The only place where Select * is OK, is in the sub-query after an Exists or Not Exists predicate clause, as in:

       Select colA, colB
       From table1 t1
       Where Exists (Select * From Table2
                     Where column = t1.colA)
    

    EDIT: To address @Mike Sherer comment, Yes it is true, both technically, with a bit of definition for your special case, and aesthetically. First, even when the set of columns requested are a subset of those stored in some index, the query processor must fetch every column stored in that index, not just the ones requested, for the same reasons - ALL I/O must be done in pages, and index data is stored in IO Pages just like table data. So if you define "tuple" for an index page as the set of columns stored in the index, the statement is still true.
    and the statement is true aesthetically because the point is that it fetches data based on what is stored in the I/O page, not on what you ask for, and this true whether you are accessing the base table I/O Page or an index I/O Page.

    For other reasons not to use Select *, see Why is SELECT * considered harmful? :