I have a MySQL very large database (1 billion rows) like this:
database : products("name","caracteristics")
Both columns are VARCHAR(50).
actually, it have no KEY sat, but "name" will be unique, so I think I will alter it as "name" PRIMARY_KEY. (I should have done that before.. now I need to perform a remove duplicate query before adding primary_key option I guess)
My problem is, when performing a simple query on the table, it takes ages literally.
SELECT caracteristics WHERE name=blabla LIMIT 1;
//takes ages.
I was thinking of partitioning the existing table.
So here are the question:
Is my idea of ALTER TABLE to set 'name' column as PRIMARY_KEY a good idea also?
also about the duplicate query, I found this around here, am I doing it properly? (don't want to mess up my table...)
delete a from products a left join( select max(name) maxname, caracteristics from products group by caracteristics) b on a.name = maxname and a.caracteristics= b.caracteristics where b.maxname IS NULL;
you can also direct set a PRIMARY KEY with the ignore option like this:
ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
this will delete all duplicates from name.
sample
MariaDB [l]> CREATE TABLE `products` (
-> `name` varchar(50) NOT NULL DEFAULT '',
-> `caracteristics` varchar(50) DEFAULT NULL
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]> INSERT INTO `products` (`name`, `caracteristics`)
-> VALUES
-> ('val1', 'asdfasdfasdf'),
-> ('val2', 'asdasDasd'),
-> ('val3', 'aesfawfa'),
-> ('val1', '99999999');
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
| val1 | 99999999 |
+------+----------------+
4 rows in set (0.00 sec)
MariaDB [l]> ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 1 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
+------+----------------+
3 rows in set (0.00 sec)
MariaDB [l]>
test ADD PRIMARY KEY / INSERT IGNORE
Here is a test between add Primary key and insert ignore into. and you can see that add Primary key (90 sec / 120 sec) is a little bit faster in this sample
MariaDB [l]> CREATE TABLE `bigtable10m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable10m`
-> select lpad(seq,8,'0') from seq_1_to_10000000;
Query OK, 10000000 rows affected (24.24 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> SELECT * FROM `bigtable10m` LIMIT 10;
+----------+
| id |
+----------+
| 00000001 |
| 00000002 |
| 00000003 |
| 00000004 |
| 00000005 |
| 00000006 |
| 00000007 |
| 00000008 |
| 00000009 |
| 00000010 |
+----------+
10 rows in set (0.00 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (28.49 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (29.01 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (32.98 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> ALTER IGNORE TABLE `bigtable30m` ADD PRIMARY KEY(id);
Query OK, 30000000 rows affected (1 min 32.34 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 0
MariaDB [l]>
MariaDB [l]> DROP TABLE `bigtable30m`;
Query OK, 0 rows affected (0.52 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (37.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (41.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (30.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> CREATE TABLE bigtable_unique (
-> `id` varchar(32) NOT NULL DEFAULT '',
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT IGNORE bigtable_unique SELECT * FROM `bigtable30m`;
Query OK, 10000000 rows affected, 65535 warnings (1 min 57.99 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 20000000
MariaDB [l]>