Search code examples
pythonsqlitereplacesql-delete

How to delete Sqlite rows that contain old data for the same field but with different timestamps?


I have a unique situation. I haven't found a similar question yet. Here's the table I have.

    TIMESTAMP               SYMBOL NAME PRICE YEAR-LOW YEAR-HIGH               STATEMENT-DATE
2022-06-12 17:32:37.117340  AOS A. O. Smith Corporation 58.06   56.61   86.74   2021-12-31
2022-06-12 17:32:37.109389  AOS A. O. Smith Corporation 58.06   56.61   86.74   2020-12-31
2022-06-12 17:32:37.101411  AOS A. O. Smith Corporation 58.06   56.61   86.74   2019-12-31
2022-06-12 17:32:37.093402  AOS A. O. Smith Corporation 58.06   56.61   86.74   2018-12-31
2022-06-12 17:32:37.026740  AOS A. O. Smith Corporation 58.06   56.61   86.74   2017-12-31
2022-06-12 17:32:29.742554  MMM 3M Company  137.65  137.58  203.59  2021-12-31
2022-06-12 17:32:29.727191  MMM 3M Company  137.65  137.58  203.59  2019-12-31
2022-06-12 17:32:29.654842  MMM 3M Company  137.65  137.58  203.59  2017-12-31
2022-06-12 17:32:08.582652  AOS A. O. Smith Corporation 58.06   56.61   86.74   2021-12-31
2022-06-12 17:32:08.574681  AOS A. O. Smith Corporation 58.06   56.61   86.74   2020-12-31
2022-06-12 17:32:08.565711  AOS A. O. Smith Corporation 58.06   56.61   86.74   2019-12-31
2022-06-12 17:32:08.558671  AOS A. O. Smith Corporation 58.06   56.61   86.74   2018-12-31
2022-06-12 17:32:07.904663  AOS A. O. Smith Corporation 58.06   56.61   86.74   2017-12-31
2022-06-12 17:32:00.701647  MMM 3M Company  137.65  137.58  203.59  2021-12-31
2022-06-12 17:32:00.685993  MMM 3M Company  137.65  137.58  203.59  2020-12-31
2022-06-12 17:32:00.670601  MMM 3M Company  137.65  137.58  203.59  2018-12-31
2022-06-12 17:32:00.604380  MMM 3M Company  137.65  137.58  203.59  2017-12-31

As you can see, I have the same data replicated many times, each time the Python program is run.

I want to make the statement-date column Unique of course, and the problem would be solved. However, many other companies may have the same statement-date, and since they are part of the same table, rows for other companies don't get written to the database. So, the statement-date column can NOT be unique in my case, which is what presents this problem.

So, everytime I run the program, the data for a company (field) gets reinserted.

How should I fix this? Either;

a. Do NOT insert values if the company's data already exists for the statement-data column. As I covered before, the statement-data column can be the same for other companies and therefore can't be unique. So, how do I achieve this?

OR

b. Write a Delete statement where I delete older data for a company.

Which would be the most efficient? And how do I write the delete statement in this case?

If a is not an option, then I need help writing something like DELETE FROM TABLE-NAME WHERE SYMBOL EXISTS FOR STATEMENT-DATE AND TIMESTAMP IS NOT LATEST? What's the command to do so? Is it possible?


Solution

  • You can create a composite unique index on the columns SYMBOL and STATEMENT-DATE so that their combination is unique:

    CREATE UNIQUE INDEX idx_symbol_date 
    ON tablename (SYMBOL, `STATEMENT-DATE`);