BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News MariaDB 11.0 Brings a New Optimizer Cost Model and More

MariaDB 11.0 Brings a New Optimizer Cost Model and More

Bookmarks

Ten years after the last release, MariaDB Server 11.0 has been released. This version has several changes, including a new optimizer cost model that aims to predict more accurately the actual cost of each query execution plan and the removal of the InnoDB change buffer.

According to the MariaDB foundation, most optimizer problems, such as reporting performance issues about bad query plans, had been informed by the community users.

The flagship feature of MariaDB 11.0 is the new optimizer cost model, which is aimed at being able to predict more accurately the cost of each query execution plan. However, there is no guarantee that it will be better in all scenarios; some queries can be even slower. The idea is to increase the MariaDB major version as a signal.

Before MariaDB 11.0, the MariaDB query optimizer used a "basic cost" of 1 for one disk access, fetching a key, fetching a row based on the rowid from the key, besides other small costs. Those costs are reasonable when finding the best index to use, but not so good for a table scan, an index scan, or a range lookup.

MariaDB 11.0 changed the basic cost for "storage engine operations" to be one millisecond, which means that for most queries, the cost should be close to the time the server spends in the storage engine + join_cache + sorting.

The engine costs have been separated into small pieces to improve accuracy. The disk read costs now assume by default an SSD disk with 400/second, which can be changed by modifying the OPTIMIZER_DISK_READ_COST.

All engine-specific costs can be found in the information_schema.optimizer_costs. To see the default costs for an engine, the following query is available:

select * from information_schema.optimizer_costs where engine="default"\G

And the results will be:

*************************** 1. row ***************************
                         ENGINE: default
       OPTIMIZER_DISK_READ_COST: 10.240000
OPTIMIZER_INDEX_BLOCK_COPY_COST: 0.035600
     OPTIMIZER_KEY_COMPARE_COST: 0.011361
        OPTIMIZER_KEY_COPY_COST: 0.015685
      OPTIMIZER_KEY_LOOKUP_COST: 0.435777
   OPTIMIZER_KEY_NEXT_FIND_COST: 0.082347
      OPTIMIZER_DISK_READ_RATIO: 0.020000
        OPTIMIZER_ROW_COPY_COST: 0.060866
      OPTIMIZER_ROW_LOOKUP_COST: 0.130839
   OPTIMIZER_ROW_NEXT_FIND_COST: 0.045916
   OPTIMIZER_ROWID_COMPARE_COST: 0.002653
      OPTIMIZER_ROWID_COPY_COST: 0.002653

To change the default costs, it is only required to specify the cost as an argument, such as set global optimizer_disk_read_cost=20.

To check the default costs for InnoDB storage engine, just run

select * from information_schema.optimizer_costs where engine="innodb"\G

Following the results:

*************************** 1. row ***************************
                         ENGINE: InnoDB
       OPTIMIZER_DISK_READ_COST: 10.240000
OPTIMIZER_INDEX_BLOCK_COPY_COST: 0.035600
     OPTIMIZER_KEY_COMPARE_COST: 0.011361
        OPTIMIZER_KEY_COPY_COST: 0.015685
      OPTIMIZER_KEY_LOOKUP_COST: 0.791120
   OPTIMIZER_KEY_NEXT_FIND_COST: 0.099000
      OPTIMIZER_DISK_READ_RATIO: 0.020000
        OPTIMIZER_ROW_COPY_COST: 0.060870
      OPTIMIZER_ROW_LOOKUP_COST: 0.765970
   OPTIMIZER_ROW_NEXT_FIND_COST: 0.070130
   OPTIMIZER_ROWID_COMPARE_COST: 0.002653
      OPTIMIZER_ROWID_COPY_COST: 0.002653

Let's take a look at the meaning of the different cost variables.

Variable Type Description
OPTIMIZER_DISK_READ_COST

Engine

Time in microseconds to read a 4K block from a disk/SSD. The default is set for a 400MB/second SSD

OPTIMIZER_INDEX_BLOCK_COPY_COST

Engine

Cost to lock and copy a block from the global cache to a local cache. This cost is added for every block accessed, independent of whether they are cached or not

OPTIMIZER_KEY_COMPARE_COST

Engine

Cost to compare two keys

OPTIMIZER_KEY_COPY_COST

Engine

Cost to copy a key from the index to a local buffer as part of searching for a key

OPTIMIZER_KEY_LOOKUP_COST

Engine

Cost to find a key entry in the index (index read)

OPTIMIZER_KEY_NEXT_FIND_COST

Engine

Cost to find the next key in the index (index next)

OPTIMIZER_DISK_READ_RATIO

Engine

The ratio of BLOCK_NOT_IN_CACHE/CACHE_READS. The cost of disk usage is calculated as estimated_blocks * OPTIMIZER_DISK_READ_RATIO * OPTIMIZER_DISK_READ_COST. A value of 0 means that all blocks are always in the cache. A value of 1 means that a block is never in the cache

OPTIMIZER_ROW_COPY_COST

Engine

Cost of copying a row to a local buffer. Should be slightly more than OPTIMIZER_KEY_COPY_COST

OPTIMIZER_ROW_LOOKUP_COST

Engine

Cost to find a row based on the rowid (Rowid is stored in the index together with the key)

OPTIMIZER_ROW_NEXT_FIND_COST

Engine

Cost of finding the next row

OPTIMIZER_ROWID_COMPARE_COST

Engine

Cost of comparing two rowids

OPTIMIZER_ROWID_COPY_COST

Engine

Cost of copying a rowid from the index

OPTIMIZER_SCAN_SETUP_COST

Session

Cost of starting a table or index scan. This has a low value to encourage the optimizer to use index lookup, also tables with very few rows

OPTIMIZER_WHERE_COST

Session

Cost to execute the WHERE clause for every found row. Increasing this variable will encourage the optimizer to find plans which read fewer rows

More information on the costs and how they were calculated can be found in the Docs/optimizer_costs.txt file in the MariaDB Source distributions.

Let's have a look at other MariaDB optimizer cost changes:

  • When counting disk accesses, it is assumed that all rows and index data are cached for the duration of the query.
  • The cost of sorting (filesort) is now more accurate, allowing the optimizer to better choose between an index scan and a filesort for ORDER BY/GROUP BY queries.

The new optimizer changes usually matter, and probably should be able to find a better plan in the following scenarios:

  • When running queries with more than two tables
  • Indexes with a lot of identical values
  • Ranges that cover more than 10% of a table (WHERE key between 1 and 1000 -- Table has values 1-2000)
  • Complex queries when not all used columns are or can be indexed
  • Queries mixing different storage engines, like using tables from both InnoDB and Memory in the same query
  • When needing to use FORCE INDEX to get a good plan
  • If using ANALYZE TABLE made the plans worse (or not good enough)
  • Queries that have lots of derived tables (subselects)
  • Using ORDER BY / GROUP BY that could be resolved via indexes

More details are available on how to upgrade between Major MariaDB versions, as well as how to upgrade from Maria DB 10.11 to 11.0.

About the Author

Rate this Article

Adoption
Style

BT