Search code examples
postgresqltextb-tree

How to insert long text (>3K chars) in columns with unique constraint


There are some problems with inserting text in postgres that is too long. when I have a simple table with text, I can insert text as long as I like (I tested up to 40K chars). However, when I add a unique constraint I begin to encounter a strange btree problem, see the minimal working example (MWE) below

MWE:

#!/bin/bash                                                                                                                                                                                           
                                                                             
 N=4096 # Aiming for a URL of length 4,096 characters                        
 DB_NAME='foo'                                                               
 # Generate random N character alphanumeric string of lenght 4,096           
 URL="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $(($N-15)) | head -n 1).com"
                                                                             
 # Case 1 we have a table ('website') which has a single column with no      
 # constraints                                                               
 TABLE_NAME='website'                                                        
 sudo -u postgres psql -c "drop database if exists $DB_NAME;"                
 sudo -u postgres psql -c "create database $DB_NAME;"                        
 sudo -u postgres psql -d $DB_NAME -c "drop table if exists $TABLE_NAME;"    
 sudo -u postgres psql -d $DB_NAME -c "create table $TABLE_NAME (url text);" 
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"
                                                                             
                                                                             
 # Case 2 we have a table ('website2') which has a single column which must be
 # unique                                                                    
 TABLE_NAME='website2'                                                       
 sudo -u postgres psql -c "drop database if exists $DB_NAME;"                
 sudo -u postgres psql -c "create database $DB_NAME;"                        
 sudo -u postgres psql -d $DB_NAME -c "drop table if exists $TABLE_NAME;"    
 sudo -u postgres psql -d $DB_NAME -c "create table $TABLE_NAME (url text unique);"
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"

Output:

$ ./test.sh                                                                 
 DROP DATABASE                                                                                                                                                                                         
 CREATE DATABASE                                                             
 NOTICE:  table "website" does not exist, skipping                           
 DROP TABLE                                                                  
 CREATE TABLE                                                                
 INSERT 0 1                                                                  
 DROP DATABASE                                                               
 CREATE DATABASE                                                             
 NOTICE:  table "website2" does not exist, skipping                          
 DROP TABLE                                                                  
 CREATE TABLE                                                                
 ERROR:  index row size 4112 exceeds btree version 4 maximum 2704 for index "website2_url_key"
 DETAIL:  Index row references tuple (0,1) in relation "website2".           
 HINT:  Values larger than 1/3 of a buffer page cannot be indexed.           
 Consider a function index of an MD5 hash of the value, or use full text indexing.

Question: What should I do if I have extremely long text (>3K chars)?

  1. Is there some way to disable this btree error?
  2. Should I remove the unique constraint and check at the application level only?
  3. Should I compress all my URLs?
  4. Is it simply impossible to achieve this via PSQL?

Solution

  • Better yet, cast the md5 to type uuid:

    CREATE UNIQUE INDEX unique_url_index ON $TABLE_NAME ((md5(url)::uuid));  -- parens required
    

    Makes the index smaller and faster. See: