I have a table in SQL Server with large amount of data - around 40 million rows. The base structure is like this:
Title | type | length | Null distribution |
---|---|---|---|
Customer-Id | number | 8 | 60% |
Card-Serial | number | 5 | 70% |
- | - | - | - |
- | - | - | - |
Note | string-unicode | 2000 | 40% |
Both numeric columns are filled by numbers with specific length.
I have no idea which data type to choose to have a database in the smallest size and having good performance by indexing the customerId
column. Refer to this Post if I choose CHAR(8)
, database consume 8 bytes per row even in null data.
I decided to use INT
to reduce the database size and having good index, but null data will use 4 bytes per rows again. If I want to reduce this size, I can use VARCHAR(8)
, but I don't know, the system has good performance on setting index on this type or not. The main question is reducing database size is important or having good index on numeric type.
Thanks.
If it is a number - then by all means choose a numeric datatype!! Don't store your numbers as char(n)
or varchar(n)
!! That'll just cause you immeasurable grief and headaches later on.
The choice is pretty clear:
if you have whole numbers - use TINYINT
, SMALLINT
, INT
or BIGINT
- depending on the number range you need
if you need fractional numbers - use DECIMAL(p,s)
for the best and most robust behaviour (no rounding errors like FLOAT
or REAL
)
Picking the most appropriate datatype is much more important than any micro-optimization for storage. Even with 40 million rows - that's still not a big issue, whether you use 4 or 8 bytes. Whether you use a numeric type vs. a string type - that makes a huge difference in usability and handling of your database!