web123456

order by optimization

1. Sort by

MySQLSupports two ways of sorting.FileSortandIndex

  • IndexIt is highly efficient, it means that MySQL completes sorting based on the index itself.
  • FileSortThe method is low efficiency, which means that MySQL scans the data itself and sorts it without index

Therefore, we need to make order by try to use index for sorting.

2. Optimize index sorting

ORDER BYMeet two situations and can use itIndexSort by:

  • ORDER BYThe statement uses the leftmost front column of the index.
  • useWHEREclause andORDER BYThe clause conditional column combination satisfies the leftmost front column of the index.

Therefore, in useORDER BYWhen we try to establish a suitable index to meet the above two situations.

3. Optimize filesort sorting

If it is not on the index column, File Sort has two algorithms: MySQL needs to start dual-channelSorting algorithmand single-channel sorting algorithm

3.1 Introduction to two sorting algorithms

  • Dual-channel sorting algorithm: MySQL 4.1 used dual-channel sorting before, literally means scanning the disk twice, finally obtaining data, reading the line pointer andORDER BYcolumns, sort them, and then scan the already sorted list, and read the corresponding data output from the list again according to the values ​​in the list.In a word, take the sort field from disk,buffersort in, and then take other fields from disk.

To get a batch of data, you need to scan the disk twice. As we all know, IO is time-consuming, so after MySQL 4.1, an improved algorithm appeared, which is a single-channel sorting algorithm.

  • Single-channel sorting algorithm: read all columns needed for query from disk, according toORDER BYListed inbufferThey are sorted and then scan the sorted list for output, which is faster and avoids the second reading of data. And turn random IO into sequential IO, but it will use more space because it saves every line in memory.

Overall, the efficiency is better than the dual-channel sorting algorithm.

But there is a problem with the single-channel sorting algorithm:SortBufferThe buffer is too small, causing all columns to be read from disk to be saved completelySortBufferIn the buffer, there will be problems with the single-channel multiplexing algorithm at this time, and insteadperformanceIt's better to use a dual-channel multiplexing algorithm.

3.2 Optimization ideas

Optimization strategy for single-channel multiplexing algorithm:

  • Enlargementsort_buffer_sizeSetting of parameters.
  • Enlargementmax_length_for_sort_dataSetting of parameters.

Improve the speed of ORDER BY sorting:

  • ORDER BYWhen usingSELECT *It is a taboo, and it is very important to write any fields you look for. The impact here is:

    • When the query's field size sum is less thanmax_length_for_sort_dataAnd the sorting field is notTEXT|BLOBWhen type, a single-channel sorting algorithm will be used, otherwise a multiple-channel sorting algorithm will be used.
    • Both sorting algorithms may exceed the datasort_bufferAfter the capacity of the buffer exceeds, it will be created.tmpTemporary files are merged and sorted, resulting in multiple IOs, but the risk of a single-channel sorting algorithm will be greater, so it needs to be increased.sort_buffer_sizeSetting of parameters.
    • Write whatever fields you look for. If there is an overlay index, even if the sorting order does not match the index order, you cannot directly use the index to sort it, but SQL can also scan the index directly without the need for full table scans, reducing the number of I/O times, thereby improving the sorting speed. (Refer to the SQL version for the specific situation)
  • Try to improvesort_buffer_size: No matter which algorithm is used, improving this parameter will improve efficiency. Of course, it must be improved according to the system's capabilities, because this parameter is for each process.

  • Try to improvemax_length_for_sort_data: Increase this parameter will increase the probability of using a single-channel sorting algorithm. But if the setting is too high, the total data capacitysort_buffer_sizeThe probability of increasing, and the obvious symptoms are high disk IO activity and low processor usage.