Tag Archive for 'design'

Everything you need to know about designing MySQL InnoDB primary keys

There are two aspects of how MySQL/InnoDB use a table’s primary key on the file system that have important ramifications for the key’s design:

  • The primary key determines the order in which the data is physically stored in the main data file, aka “the clustered index”. Another way of saying this: the main data file is a B-tree index that directly contains all of a table’s columns, and the key on this B-tree is the primary key. This can result in one less I/O operation (compared to MyISAM) on many queries.
  • The primary key is what is used to associate all of a table’s indexes with the main data file. So the primary key is replicated in every row of every index.

How to design your InnoDB primary keys:

  1. Do explicitly define a primary key, and make it numeric. If no primary key is specified, or a non-numeric column is used as a primary key, InnoDB will used its own internal auto-incremented row ID to link between the main data table and the other indexes. This row ID is 6 bytes. The biggest numeric type in MySQL, BIGINT, is 8 bytes, and the next smallest type, INT, is 4 bytes. So not only is this “hidden” row ID an entire extra redundant column, it is also bigger than the data type that would be used in the vast majority of cases where a primary key is explicitly specified. Furthermore, when InnoDB falls back to using this internal row ID, the rows in the main data file are ordered by insertion order instead of your primary key, rendering InnoDB’s clustering behavior useless, and adding an entire extra I/O operation on some queries. This is a big loss.
  2. As alluded to above, the primary key should be sequential. Because the structure of the data on the disk is dictated by the primary key, it should be sequential, to avoid extremely expensive I/O on writes (when the B-Tree has to be reorganized), and to take advantage of the native clustering.
  3. The primary key should be as small as possible. Because the primary key is replicated in every entry of every index, designing the primary key to be as small as possible impacts the efficiency of all the indexes.

Using these 3 simple guidelines on a well-designed schema, InnoDB can perform at its full potential.

Above I mentioned that the primary key is stored in every row of every index. This includes the entire contents of a multi-column primary key. In a future article I will discuss scenarios/designs that can take advantage of this behavior.




Close
Powered by ShareThis