Search code examples
mysqloptimizationforeign-keyscharvarchar

MySQL : Table optimization word 'guest' or memberid in the column


This question is for MySQL (it allows many NULLs in the column which is UNIQUE, so the solution for my question could be slightly different).

There are two tables: members and Table2. Table members has: memberid char(20), it's a primary key. (Please do not recommend to use int(11) instead of char(20) for memberid, I can't change it, it contains exactly 20 symbols).

Table2 has:

CREATE TABLE IF NOT EXISTS `Table2`
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    memberid varchar(20) NOT NULL,
    `Time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status tinyint(4) NOT NULL,
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;

Table2.memberid is a word 'guest' (could be repeated many times) or a value from members.memberid (it also could be repeated many times). Any value from Table2.memberid column (if not 'guest') exists in members.memberid column. Again, members.memberid is unique. Table2.memberid, even excluding words 'guest' is not unique.

So, Table2.memberid column looks like: 'guest' 'lkjhasd3lkjhlkjg8sd9' 'kjhgbkhgboi7sauyg674' 'guest' 'guest' 'guest' 'lkjhasd3lkjhlkjg8sd9'

Table2 has INSERTS and UPDATES only. It updates only status. Criteria for updating status: set status=0 WHERE memberid='' and status=1. So, it could be updated once or not updated at all. As result, the number of UPDATES is less or equal (by statistics it is twice less) than number of INSERTS.

The question is only about optimization. The question could be splitted as:

1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and reserved' string) so you can use chars(20) for Table2.memberid, because all values are char(20)?

2) What about using a foreign key? I can't use it because of the value 'guest'. That value can NOT be in members.memberid column.

Using another words, I need some help to decide:

  • wether I can use 'guest' (I like that word) -vs- choosing 20-char-reserved-string so I can use char(20) instead of varchar(20) -vs- keeping NULLs instead of 'guest',

  • all values, except 'guest' are actually foreign keys. Is there any possible way to use this information for increasing the performance?

  • That table is used pretty often so I have to build Table2 as good as I can. Any idea is highly appreciated.

Thank you.

Added: Well... I think I have found a good solution, that allows me to treat memberid as a foreign key.


Solution

  • 1) Do you HIGHLY recommend to replace the word 'guest' to NULL or to a special 'xxxxxyyyyyzzzzz00000' (20 symbols like a 'very special and reserved' string) so you can use chars(20) for Table2.memberid, because all values are char(20)?

    Mixing values from different domains always causes trouble. The best thing to do is fix the underlying stuctural problem. Bad design can be really expensive to work around, and it can be really expensive to fix.

    Here's the issue in a nutshell. The simplest data integrity constraint for this kind of issue is a foreign key constraint. You can't use one, because "guest" isn't a memberid. (Member ids are from one domain; "guest" isn't part of that domain; you're mixing values from two domains.) Using NULL to identify a guest doesn't help much; you can't distinguish guests from members whose memberid is missing. (Using NULL to identify anything is usually a bad idea.)

    If you can use a special 20-character member id to identify all guests, it might be wise to do so. You might be lucky, in that "guest" is five letters. If you can use "guestguestguestguest" for the guests without totally screwing your application logic, I'd really consider that first. (But, you said that seems to treat guests as logged in users, which I think makes things break.)

    Retrofitting a "users" supertype is possible, I think, and this might prove to the the best overall solution. The supertype would let you treat members and guests as the same sometimes (because they're not utterly different), and different at other times (because they're not entirely the same). A supertype also allows both individuals (members) and aggregate users (guests all lumped together) without undue strain. And it would unify the two domains, so you could use foreign key constraints for members. But it would require changing the program logic.

    In Table2 (and do find a better name than that, please), an index on memberid or a composite index on memberid and status will perform just about as well as you can expect. I'm not sure whether a composite index will help; "status" only has two values, so it's not very selective.

    all values, except 'guest' are actually foreign keys. Is there any possible way to use this information for increasing the performance?

    No, they're not foreign keys. (See above.) True foreign keys would help with data integrity, but not with SELECT performance.

    "Increasing the performance" is pretty much meaningless. Performance is a balancing act. If you want to increase performance, you need to specify which part you want to improve. If you want faster inserts, drop indexes and integrity constraints. (Don't do that.) If you want faster SELECT statements, build more indexes. (But more indexes slows the INSERTS.)

    You can speed up all database performance by moving to hardware that speeds up all database performance. (ahem) Faster processor, faster disks, faster disk subsystem, more memory (usually). Moving critical tables or indexes to a solid-state disk might blow your socks off.

    Tuning your server can help. But keep an eye on overall performance. Don't get so caught up in speeding up one query than you degrade performance in all the others. Ideally, write a test suite and decide what speed is good enough before you start testing. For example, say you have one query that takes 30 seconds. What's acceptable improvement? 20 seconds? 15 seconds? 2 milliseconds sounds good, but is an unlikely target for a query that takes 30 seconds. (Although I've seen that kind of performance increase by moving to better table and index structure.)