Search code examples
duckdb

How to add new column to duckdb query result based on a function that uses one of the column


I've a csv file named s3-durations.csv which consists of two headers called duration(numeric type) and timestamp (timestamptz).

This is how I read the csv into duckdb -

SELECT *
  FROM read_csv('s3-durations.csv',
      delim = ',',
      header = true,
      columns = {
          'duration': 'numeric',
          'timestamp': 'timestamptz'
      });

I want to add new column to the query result called start_time which is mathematically timestamp - INTERVAL $duration MILLISECONDS.

I'm trying to achieve that using below query -

SELECT duration, timestamp, timestamp - INTERVAL duration MILLISECONDS
  FROM read_csv('s3-durations.csv',
      delim = ',',
      header = true,
      columns = {
          'duration': 'numeric',
          'timestamp': 'timestamptz'
      });

The error is -

Parser Error: syntax error at or near "MILLISECONDS" LINE 1: ...mestamp, timestamp - INTERVAL duration MILLISECONDS ^

But the below query is working fine which uses some constant.

SELECT duration, timestamp, timestamp - INTERVAL 5 MILLISECONDS
  FROM read_csv('s3-durations.csv',
      delim = ',',
      header = true,
      columns = {
          'duration': 'numeric',
          'timestamp': 'timestamptz'
      });

How to fix my duckdb query?


Solution

  • It can be done like this:

      with d as ( SELECT *
        FROM read_csv('s3-durations.csv',
            delim = ',',
            header = true,
            columns = {
                'duration': 'numeric',
                'timestamp': 'timestamptz'
            }))
      select 
       duration, 
       timestamp, 
       timestamp - (duration || ' MILLISECONDS')::interval as start 
    from d;
    

    or like this:

      with d as ( SELECT *
        FROM read_csv('s3-durations.csv',
            delim = ',',
            header = true,
            columns = {
                'duration': 'numeric',
                'timestamp': 'timestamptz'
            }))
      select 
       duration, 
       timestamp, 
       timestamp - to_milliseconds(duration::int) as start 
    from d;