Slow Postgres 9.3 queries

I'm trying to figure out if I can speed up two queries on a database storing email messages. Here's the table:

\d messages;
                             Table "public.messages"
     Column     |  Type   |                       Modifiers
----------------+---------+-------------------------------------------------------
 id             | bigint  | not null default nextval('messages_id_seq'::regclass)
 created        | bigint  |
 updated        | bigint  |
 version        | bigint  |
 threadid       | bigint  |
 userid         | bigint  |
 groupid        | bigint  |
 messageid      | text    |
 date           | bigint  |
 num            | bigint  |
 hasattachments | boolean |
 placeholder    | boolean |
 compressedmsg  | bytea   |
 revcount       | bigint  |
 subject        | text    |
 isreply        | boolean |
 likes          | bytea   |
 isspecial      | boolean |
 pollid         | bigint  |
 username       | text    |
 fullname       | text    |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (id)
    "idx_unique_message_messageid" UNIQUE, btree (groupid, messageid)
    "idx_unique_message_num" UNIQUE, btree (groupid, num)
    "idx_group_id" btree (groupid)
    "idx_message_id" btree (messageid)
    "idx_thread_id" btree (threadid)
    "idx_user_id" btree (userid)

Output from SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid)) FROM pg_class WHERE oid='messages'::regclass;

 relname  | relpages | reltuples | pg_size_pretty
----------+----------+-----------+----------------
 messages |  1584913 |   7337880 | 32 GB

Some possibly relevant postgres config values:

shared_buffers = 1536MB
effective_cache_size = 4608MB
work_mem = 7864kB
maintenance_work_mem = 384MB

Here are the explain analyze outputs:

explain analyze SELECT * FROM messages WHERE groupid=1886 ORDER BY id ASC LIMIT 20 offset 4440;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=479243.63..481402.39 rows=20 width=747) (actual time=14167.374..14167.408 rows=20 loops=1)
   ->  Index Scan using messages_pkey on messages  (cost=0.43..19589605.98 rows=181490 width=747) (actual time=14105.172..14167.188 rows=4460 loops=1)
         Filter: (groupid = 1886)
         Rows Removed by Filter: 2364949
 Total runtime: 14167.455 ms
(5 rows)

The second query:

explain analyze SELECT * FROM messages WHERE groupid=1886 ORDER BY created ASC LIMIT 20 offset 4440;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=538650.72..538650.77 rows=20 width=747) (actual time=671.983..671.992 rows=20 loops=1)
   ->  Sort  (cost=538639.62..539093.34 rows=181490 width=747) (actual time=670.680..671.829 rows=4460 loops=1)
         Sort Key: created
         Sort Method: top-N heapsort  Memory: 7078kB
         ->  Bitmap Heap Scan on messages  (cost=7299.11..526731.31 rows=181490 width=747) (actual time=84.975..512.969 rows=200561 loops=1)
               Recheck Cond: (groupid = 1886)
               ->  Bitmap Index Scan on idx_unique_message_num  (cost=0.00..7253.73 rows=181490 width=0) (actual time=57.239..57.239 rows=203423 loops=1)
                     Index Cond: (groupid = 1886)
 Total runtime: 672.787 ms
(9 rows)

This is on an SSD, 8GB Ram instance, load average is usually around 0.15.

I'm definitely no expert. Is this a case of the data just being spread throughout the disk? Is my only solution to use CLUSTER?

One thing I don't understand is why is it using idx_unique_message_num as the index for the second query. And why is ordering by ID so much slower?

Solution

If there are many records with groupid=1886 (from comment: there are 200,563), to get to records at an OFFSET of a sorted subset of rows, would require sorting (or an equivalent heap algorithm) which is slow.

This could be solved by adding an index. In this case, one on (groupid,id) and another on (groupid,created).

From comment: This indeed helped, taking down the runtime to 5ms-10ms.