Search code examples
mysqlmariadbdatabase-partitioningmyisam

MySQL Open Files (MyISAM, Parititioned Large Database)


I have a quite large database that I am working on, a collection of measurement data. I am running into issues with hitting the max open files (status global variable Open_files hitting the global variable open_files_limit). Sure, I can increase the open_files_limit in /etc/my.cnf, but it seems that I still can't get enough breathing room - heavy user load will ultimately still cause me to hit this condition.

I'm trying to get a better idea as to why so many files are opened, and why they don't seem to close.

Size Summary: Total number of tables in the database is approximately 3,000. Each table is using the MyISAM engine, and each table consists of ~500 columns (variable data types) and has 173 partitions. Each partition represents a week of data.

Table structures (all ~3,000) are each created like this:

CREATE TABLE `tbl_0000` (
  `time_unix` int(10) unsigned NOT NULL,
  `time_msecs` smallint(5) unsigned NOT NULL,
  `col_000` tinyint(4) DEFAULT NULL,
  `col_001` tinyint(4) DEFAULT NULL,
  `col_002` double DEFAULT NULL,
  ...
  `col_0501` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`time_unix`,`time_msecs`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
 PARTITION BY RANGE (`time_unix`)
(PARTITION `p0` VALUES LESS THAN (1649977029) ENGINE = MyISAM,
 PARTITION `p1` VALUES LESS THAN (1650581829) ENGINE = MyISAM,
...
 PARTITION `p172` VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

Yes, I know - it's a large amount of data. MyISAM is being used for fast reading purposes (and transactions are not needed). This has generally worked extremely well for my setup, it's just when it's getting aggressively/atypically hammered on the reading side that I run into these issues.

My basic questions and observations:

  1. If I perform a query like "SELECT COUNT(*) from tbl_0023 WHERE time_unix > 1649977029 AND time_unix < 1650581829; ... Open_files increases by 346. This happens to be [number of partitions]x2, and tells me that this query must be opening each and every partition (both the data file .MYD and the index file .MYI, hence the factor of x2). This puzzles me, as I would think that I'm only dipping into a single partition (p1) with this query. Does this have something to do with the composite key versus the partition single-column? My hunch is that it does, but I'm not sure why.

  2. When will these open files begin to close? It is not clear to me which variable controls this (or if there is one even settable). I can see that the "offending" user/connection (that performed the query) has no bearing on this ... if they cease to exist the open files remain. Ideally, I'd like a way to control this (e.g. set it to perhaps an hour). FLUSH TABLES; closes them all, of course, but I gather that this is not intended to be called/used in this manner.


Solution

    • table_open_cache is a "cache". If it is not big enough, some will be bumped out.
    • InnoDB uses fewer file handles. Assuming innodb_file_per_table = ON, it needs only one handle per table (as opposed to 2). But, Innodb disk footprint is usually 2x-3x bigger than MyISAM.
    • It is an old wives tale that MyISAM is faster. After I see more info on your system (see Comments), I will think about whether your case is an exception. What version of MySQL?
    • ulimit -n, or systemd LimitNOFILE, It feeds to MySQL's open_files_limit, which is a cap for table_open_cache`.
    • That value of 1024000 may cause other problems.
    • Do you ever purge old data? Or is that "173 partitions" a growing number?
    • How much RAM do you have?
    • Are your queries usually in the 'latest' partition? Or are they randomly scattered? (This can have a big impact on performance.)
    • I don't see any secondary indexes; do you only fetch by time?
    • DOUBLE (8 bytes) is rarely a useful datatype, especially for measurements. Consider FLOAT (4 bytes, ~7 significant digits).
    • MySQL has already deprecated MyISAM; MariaDB may follow suit some day.
    • That sounds like an average of 3K rows per partition? (1-3KB per row, 3000*173 partitions, total of 1-2 TB.) I suggest that it is hardly worth partitioning, especially if the PRIMARY KEY is the only index you have. Are there 500 different sensors you are tracking on each of 3000 devices or vehicles (or whatever) that you are tracking? No partitioning, no multiple tables, simply PRIMARY KEY(sensor_id, datetime) will be much simpler and probably more efficient.
    • I assume the data comes in ~chronologically for each of the 3000 items. In a single-table solution, with the PK above, there would be 3000 hot spots. This is not a problem for performance problem. If fact it would be more efficient in InnoDB due to the layout of the data on disk.
    • Sounds like you have lots of Summary tables, too. You don't need a separate summary table for each granularity. Instead, have one, maybe two. Perhaps "day" and "month". For in-between things (hour, week), sum up the values in the appropriate summary table. More discussion: Summary Tables
    • PARTITION BY TO_DAYS(...) (or TO_SECONDS) for the single table would provide a single (not 3000) DROP + REORGANIZE every day (or week) to purge data older than 3 years. More: Partition
    • InnoDB organizes the data based on the PRIMARY KEY.
    • MyISAM tacks new rows onto the end of the table until there have been deletes. THen it give preference to filling in the holes. This may lead to a very fragmented table. MyISAM requires an extra index lookup for the PK that InnoDB avoids.