Search code examples
sqlpostgresql

SQL table with thousands of columns


I need to store data(float) from 20,000 sensors once a second. I originally wanted to create the following table:

time sensor 1 sensor 2 ... sensor 20000
2024-09-06 13:00:00 1.2 5.3 .... 2.0

But then I found a table cannot have more than 1600 columns in PostgreSQL. What's the best practice to store this kind of data? Separate them into multiple tables or switch to another type of DB?

All 20000 sensor values are read and inserted together.

I need to query up to 100 of them per second to plot trend charts.


Solution

  • Here's how much space it takes to store 1 minute of the randomly generated per-second readings from 20k sensors with 10% sparsity (they share setseed(), so the random data they save is the exact same):

    32 tables
    625 columns each
    numeric[] SQL array jsonb array jsonb object json array json object hstore Entity-Attribute-Value
    16MB 11MB 11MB 15MB 14MB 18MB 38MB 55MB

    Column names link to documentation, cells link to db<>fiddle demos you can play around with.

    In each case you can save space by reducing precision and scale of your readings, e.g. using a numeric(4,2). That results in the array going down in size to 2.5MB and also shows how much of the EAV is just overhead and duplication of the time and sensor signatures, as it only shrinks to 46MB.

    Space consumption is only one of the factors, but you can use these as a starting point for further tests.

    numeric[]:

    create table your_signals(measured_at timestamptz, reading_values numeric[]);
    select setseed(.42);--makes this test repeatable
    insert into your_signals values
      (now(),(select array_agg(case when .9>random() then random() end) 
              from generate_series(1,2e4)n2) );
    select measured_at
      ,reading_values[1] as s1
      ,reading_values[5] as s5
      ,reading_values[9999] as s9999
      ,reading_values[20000] as s20000
    from your_signals
    
    measured_at s1 s5 s9999 s20000
    2024-09-07 12:46:21.978572+01 0.362470663311556 0.5754996219675 0.800965200844344 0.906566857051784

    jsonb array:

    create table your_signals(measured_at timestamptz, reading_values jsonb);
    select setseed(.42);--makes this test repeatable
    insert into your_signals values
      (now(),(select jsonb_agg(case when .9>random() then random()::numeric(30,15) end)
              from generate_series(1,2e4)n2)
      );
    select measured_at
      ,reading_values[0] as s1
      ,reading_values[4] as s5
      ,reading_values[9998] as s9999
      ,reading_values[19999] as s20000
    from your_signals
    
    measured_at s1 s5 s9999 s20000
    2024-09-07 12:55:51.234168+01 0.362470663311556 0.575499621967500 0.800965200844344 0.906566857051784

    jsonb object:

    create table your_signals(measured_at timestamptz, reading_values jsonb);
    select setseed(.42);--makes this test repeatable
    insert into your_signals values
      (now(),(select jsonb_object_agg('s'||n2,random()::numeric(30,15))filter(where .9>random()) 
              from generate_series(1,2e4)n2) 
      );
    select measured_at
      ,reading_values['s1'] as s1
      ,reading_values['s5'] as s5
      ,reading_values['s9999'] as s9999
      ,reading_values['s20000'] as s20000
    from your_signals
    
    measured_at s1 s5 s9999 s20000
    2024-09-07 12:59:48.442463+01 0.362470663311556 0.575499621967500 0.800965200844344 0.906566857051784

    entity-attribute-value:

    create table your_signals(
           measured_at timestamptz
         , source_sensor smallint
         , reading_value numeric);
    
    select setseed(.42);--makes this test repeatable
    insert into your_signals 
    select now(), n, random() 
    from generate_series(1,2e4)n
    where .9>random();
    
    --requires a pivot to view sensors in columns
    select measured_at
          ,min(reading_value)filter(where source_sensor=1)     as s1
          ,min(reading_value)filter(where source_sensor=5)     as s5
          ,min(reading_value)filter(where source_sensor=9999)  as s9999
          ,min(reading_value)filter(where source_sensor=20000) as s20000
    from your_signals
    where source_sensor in (1,5,9999,20000)
    group by measured_at;
    
    measured_at s1 s5 s9999 s20000
    2024-09-07 12:58:24.030178+01 0.362470663311556 0.5754996219675 0.800965200844344 0.906566857051784