I have the following sql table called datatable
,
date yesteday today tomorrow reading source
2021-01-01 x x x x 1
2021-01-01 x x x x 2
2021-01-01 x x x x 3
...
2021-01-02 x x x x 1
2021-01-02 x x x x 2
2021-01-02 x x x x 3
...
2021-05-31 x x x x 1
2021-05-31 x x x x 2
2021-05-31 x x x x 3
When I created the table, I set the following,
create table datatable
(
date date,
yesterday real,
today real,
tomorrow real,
reading real,
source varchar
)
Then I created an index:
create index datatable_idx on datatable (date, source)
Every day, the source numbers will repeat themselves. So date together with source would be unique.
I want to improve my data collection and storing process. I was told that I should create a primary key.
I'm guessing the command to use here would be
ALTER TABLE datatable ADD PRIMARY KEY (date, source)
My question is why should I do this and what is the difference between this and the index I created. Will it affect my process?
Also next time I create a table, do I have to create an index and primary like this or is there a way to do both upon creation?
A primary key is a constraint that specifies that the values in its column(s) must be (mutually) unique and not null. This is so that it can be guaranteed to uniquely identify every row in the table, which is most useful when wanting to create foreign keys that refer to this table. It's still worth having if you don't have other tables, as it will prevent your table getting into a bad state where, for example, you have more than one of the same source value on a particular date.
The primary key field(s) will almost always have an index on them as well as they are used often for lookups and JOINs, but the two concepts are separate.
Some DBMSs (e.g. MySQL, SQL Server) automatically create a clustered index on the primary key, meaning that data in the table is sorted on disk by the field(s) comprising the primary key to make the common operations above even faster. However, postgres does not do this by default.
You can specify the primary key on table creation using the following syntax:
create table datatable
(
date date,
yesterday real,
today real,
tomorrow real,
reading real,
source varchar,
PRIMARY KEY (source, date)
)
You can also add a new, auto-incrementing integer field to act as your primary key (generally referred to as a surrogate key or artificial key). You might want to do this if you don't have any other good candidates in your data items, but it can have other benefits as well (potentially faster JOINs for example).
create table datatable
(
id serial primary key,
date date,
yesterday real,
today real,
tomorrow real,
reading real,
source varchar
)