Search code examples
databaseperformancedatabase-designduckdb

Create DuckDB table with primary key from parquet


I am trying to set up a simple but large DuckDB database with a single column of unique values as read from a parquet file. For faster inference of single-point existence checking (WHERE id = test_id), I want to convert the parquet file to a DuckDB (as recommended) and add a primary key to it.

I tried it as follows:

CREATE OR REPLACE TABLE data (
    id UUID PRIMARY KEY
)
AS SELECT DISTINCT id FROM 'my-parquet-file.parquet'

but got the error:

duckdb.duckdb.ParserException: Parser Error: syntax error at or near "AS"

So what is the correct way of setting up the table for efficient single-point existence checking on huge tables?


Solution

  • You can add primary key via alter table ... add primary key ... after creation:

    create or replace table data
    as
    select distinct id
    from 'my-parquet-file.parquet';
    
    alter table data add primary key (id);
    

    or create table with primary key and then insert data into it:

    create or replace table data (
        id uuid primary key
    );
    
    insert into data (id)
    select distinct id
    from 'my-parquet-file.parquet';