Is there really performance different between of string and int cloumn(table design)?

I am interested in this issue. Every time I design a table, I have this doubt. Take table posts as an example, it contains a column named post_type which could be one of the following value:

post(varchar) or 1(tinyint)
page(varchar) or 2(tinyint)
revision(varchar) or 3(tinyint)

The problem is that what type should I use for that column. varchar makes query results will be more intuitive, I dont need to figure out what 1/2/3 mean.

As to tinyint, does it perform better than varchar?

PS: I am using MySQL.

Solution

Data types don't have performance. They are a storage format.

Queries do have performance. So to evaluate performance, you should be specific about which query you are trying to measure.

In a query that merely fetches the row by its primary key, there's no practical difference. InnoDB keeps columns for a given row together on a page, so once it has fetched the page from disk into RAM, all the columns are available. The difference between reading 4 bytes for an integer vs. reading 8 bytes for a string like 'revision' is insignificant.

SELECT post_type FROM posts WHERE post_id = 8675309;

If you're looking up rows by their post_type value, then it becomes a little more important, because it needs to do some comparison to evaluate each row to see if it should be included in the result. Depending on the number of rows, and whether you have an index, the difference between string comparisons and integer comparisons could be important.

SELECT ... FROM posts WHERE post_type = 'revision';

I created a table and filled it with > 1 million rows:

create table posts (
  post_id serial primary key, 
  post_type_utf varchar(10), 
  post_type_bin varbinary(10), 
  post_type_int int
);

Then I timed how long it takes to search the whole table:

select count(*) from posts where post_type_utf = 'revision';
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.24 sec)

mysql> select count(*) from posts where post_type_bin = binary 'revision';
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.15 sec)

mysql> select count(*) from posts where post_type_int = 1;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.15 sec)

The time suggests that searching for an integer is about the same as searching for a binary string.

Why is a utf8 string slower? Because every string comparison has to evaluate character by character, against the collation defined for the column. A binary string comparison can just use memcmp() to compare the whole string in one operation.

It's also important to consider that indexes are usually a greater factor for performance than which data type you choose. Indexes help because your query for a specific post_type value will only examine those rows that match.

But in this case, you only have a few distinct values for the post_type, so a search in an index is likely to match many rows regardless.