Search code examples
sqlpostgresqlaggregategreatest-n-per-groupwindow-functions

Run Postgres query that groups by one field and sorts by another


I have a PostgreSQL table with the following relevant fields:

url
title
created_at

There can be many rows that contain identical URLs but different titles. Here are some sample rows:

www.nytimes.com | The New York Times         | 2016-01-01 00:00:00`
www.wsj.com     | The Wall Street Journal    | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online  | 2016-01-06 07:19:08`

I'm trying to obtain an output that lists the following fields:

1) url
2) title that corresponds to the highest value of created_at
3) count of all title for that unique url

So, output rows for the above sample would look something like this:

www.nytimes.com | The New York Times Online | 2
www.wsj.com     | The Wall Street Journal   | 1

Based on the numerous SO posts I've read on similar questions, it looks like my best option for obtaining the first two fields (url and latest title) would be to use DISTINCT ON:

select distinct on (url) url, title from headlines order by url, created_at desc 

Likewise, to obtain the first and third fields (url and count of all title), I could simply use GROUP BY:

select url, count(title) from headlines group by url

What I can't figure out is how to combine the above methodologies and obtain the above-mentioned three values I'm trying to get.

(Edited to provide more clarity.)


Solution

  • Try;

    select t1.url, t2.title, t1.cnt
    from (
      select url, count(title) cnt 
      from headlines 
      group by url
    ) t1
    join (
      select distinct on (url) url, title 
      from headlines 
      order by url, created_at desc
    ) t2 on t1.url = t2.url
    order by t1.url
    

    join both queries on url

    sql fiddle demo