Search code examples
databasedimension

High dimensionality on data sets


I have seen in many questions on SO, asking for your opinion of how do you define a huge table in a database....

I have seen that you have answered >= 1 billion of rows, etc..

Doing some research and following Wegman and Solka they classify a huge table with more than 10^10 entries

My question is refering to a number of columns in a table, I know 1,000 columns is a very considerable number, (supposing we have 1 billion rows)...

  • Do you know any paper where they measure or say when a table is huge depending on columns AND rows?

Solution

  • As far as SQL Server is concerned, the column limit on a normal table is 1,024. So I would consider any number of columns approaching that limit to be large. That said, you can use wide tables to extend the limit up to 30,000 but there are tradeoffs:

    A wide table is a table that has defined a column set. Wide tables use sparse columns to increase the total of columns that a table can have to 30,000. The number of indexes and statistics is also increased to 1,000 and 30,000, respectively. The maximum size of a wide table row is 8,019 bytes. Therefore, most of the data in any particular row should be NULL. To create or change a table into a wide table, you add a column set to the table definition. The maximum number of nonsparse columns plus computed columns in a wide table remains 1,024.

    By using wide tables, you can create flexible schemas within an application. You can add or drop columns whenever you want. Keep in mind that using wide tables has unique performance considerations, such as increased run-time and compile-time memory requirements.

    According to this thread, the limit for PostgreSQL is 1,600 columns per table.

    Based on these numbers, I would suggest any number of columns that approaches 1,000 to be huge.