postgresql - are my indexes or column types slowing down my query?

I have a table I created locally to use some of PG's window functions on a dataset that has about 4 million rows (originally a text file). Each row corresponds to a customer order.

CREATE TABLE orders
(
  orderid integer,
  customerid integer,
  orderdate date,
  status text,
  amount money,
  tax money,
  customername text,
  customerstate text

I have the database running locally in a i7 8gb RAM Windows 8 machine. I have btree indexes (indices?) on orderid, customerid and orderdate.

When I run the following query, it takes 300 seconds (appx). I was hoping with some basic tuning I could get it down to a minute, but I'm not a DBA. Anyone have tips?

select orderid, customername, orderdate, 
rank() OVER (PARTITION BY customername ORDER BY orderdate ASC) as cust_ord_nbr
from orders

Solution

Covering index

Partition by customerid, like @Daniel commented. integer is smaller and cheaper to sort. If you don't need customername in the result, replace it with customerid completely.

A multicolumn index can help (like @wildplasser commented). If it's a (mostly) read-only table, a "covering" index that allows index-only scans would be even faster - especially if included columns are small:

CREATE INDEX orders_nbr_idx ON orders (customerid, orderdate, orderid);

Adding orderid to the index only makes sense if you get index-only scans out of it. If you need the customername, add that, too. More:

How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?

If it's (mostly) a read-only table, execute the expensive query once and save the snapshot as MATERIALIZED VIEW for reuse ...

Peanuts

You can do a couple of small things to reduce the memory footprint. After playing column tetris, this would save 0-7 bytes per row currently lost to padding:

CREATE TABLE orders (
  orderid integer,
  customerid integer,
  amount money,
  tax money,
  orderdate date,
  status text,
  customername text,
  customerstate text
  );

If you write the result to another table (or MATERIALIZED VIEW), it would save a bit to optimize the query in a similar fashion. rank() produces a bigint, by casting to int you save 8 bytes per row (4 + 4 padding):

SELECT orderid, customername, orderdate
    -- orderid, customerid, orderdate  -- good enough?
     , rank() OVER (PARTITION BY customerid
                    ORDER BY orderdate)::int AS cust_ord_nbr
FROM   orders;