sql postgresql greatest-n-per-group postgresql-13 distinct-on

Is SELECT DISTINCT ON (col) * valid?

SELECT DISTINCT ON (some_col)
*
FROM my_table

I'm wondering if this is valid and will work as expected. Meaning, will this return all columns from my_table, based on distinct some_col? I've read the Postgres docs and don't see any reason why this wouldn't work as expected, but have read old comments here on SO which state that columns need to be explicitly listed when using distinct on.

I do know it's best practice to explicitly list columns, and also to use order by when doing the above.

Background that you probably don't need or care about

For background and the reason I ask, is we are migrating from MySQL to Postgres. MySQL has a very non-standards compliant "trick" which allows a SELECT * ... GROUP BY which allows one to easily select * based on a group by. Previous answers and comments about migrating this non-standard-compliant trick to Postgres are murky at best.

Solution

SELECT DISTINCT ON (some_col) *
FROM   my_table;

I'm wondering if this is valid

Yes. Typically, you want ORDER BY to go with it to determine which row to pick from each set of peers. But choosing an arbitrary row (without ORDER BY) is a valid (and sometimes useful!) application. You just need to know what you are doing. Maybe add a comment for the afterworld?

See:

Select first row in each GROUP BY group?

will this return all columns from my_table, based on distinct some_col?

It will return all columns. One arbitrary row per distinct value of some_col.

Note how I used the word "arbitrary", not "random". Returned rows are not chosen randomly at all. Just arbitrarily, depending on current implementation details. Typically the physically first row per distinct value, but that depends.

I do know it's best practice to explicitly list columns.

That really depends. Often it is. Sometimes it is not. Like when I want to get all columns to match a given row type.