Search code examples
postgresqlgenerate-series

Is there a simpler way to write generate_series code for generating multiple columns?


The generate_series function is one of those Very Cool Features in Postgres that I find confusing. It's dead simple if you want a single series of numbers, incremented timestamps, etc. But not so obvious (to me) when you want to populate rows of test data with multiple columns. I need something like that today, so I figured I'd give it another try. The code below does generate what I'm after, a counter and a timestamp on the hour to match. I ended up using a CTE for the number series that I then reuse in the timestamp calculation.

Is there a simpler way?

And, yes, it would be easy to generate mock data like this in an another language, or even Excel, etc. But I'm keen to figure out how to do things in Postgres. This sample is completely synthetic, but there are plenty of times when some part of the data would be coming from existing data in Postgres.

Thanks for suggestions and etc.

CREATE TABLE IF NOT EXISTS api.sample_series (
    id INTEGER NOT NULL DEFAULT 0,
    stamp timestamptz NOT NULL DEFAULT NULL,

CONSTRAINT sample_series_id_pkey
    PRIMARY KEY (id)
);

-- Generate a series of numbers for each hour of the year. It's okay if I'm off by one here, I'm only running a test. So 1 through 8760.

WITH 
counter as (
   select * from generate_series(1,8760,1) as hour_number),

-- Now use that series to generate two columns, the original counter/hour_number and a calculated timestamp to match
values as (
   select hour_number as id,
           '2019-01-01 00:00'::timestamp + interval '1' HOUR * hour_number as stamp
   from counter)

-- You've now got 8,760 numbers and timestamps, push them into the table.
INSERT INTO api.sample_series (id,stamp)

SELECT id,stamp from  values```


Solution

  • The two CTEs are unneeded and in my opinion don't do anything to improve readability.

    So the simpler way would be:

    INSERT INTO api.sample_series (id,stamp) 
        select
          hour_number as id,
          '2019-01-01 00:00'::timestamp + interval '1 HOUR' * hour_number as stamp 
         from generate_series(1,8760,1) as hour_number;
    

    The text of the expression for "stamp" is unfortunately quite long, but I don't think hiding it behind a CTE accomplishes much.