1. Sort by
MySQLSupports two ways of sorting.FileSort
andIndex
:
-
Index
It is highly efficient, it means that MySQL completes sorting based on the index itself. -
FileSort
The method is low efficiency, which means that MySQL scans the data itself and sorts it without index
Therefore, we need to make order by try to use index for sorting.
2. Optimize index sorting
ORDER BY
Meet two situations and can use itIndex
Sort by:
-
ORDER BY
The statement uses the leftmost front column of the index. - use
WHERE
clause andORDER BY
The clause conditional column combination satisfies the leftmost front column of the index.
Therefore, in useORDER BY
When we try to establish a suitable index to meet the above two situations.
3. Optimize filesort sorting
If it is not on the index column, File Sort has two algorithms: MySQL needs to start dual-channelSorting algorithmand single-channel sorting algorithm
3.1 Introduction to two sorting algorithms
- Dual-channel sorting algorithm: MySQL 4.1 used dual-channel sorting before, literally means scanning the disk twice, finally obtaining data, reading the line pointer and
ORDER BY
columns, sort them, and then scan the already sorted list, and read the corresponding data output from the list again according to the values in the list.In a word, take the sort field from disk,buffer
sort in, and then take other fields from disk.
To get a batch of data, you need to scan the disk twice. As we all know, IO is time-consuming, so after MySQL 4.1, an improved algorithm appeared, which is a single-channel sorting algorithm.
- Single-channel sorting algorithm: read all columns needed for query from disk, according to
ORDER BY
Listed inbuffer
They are sorted and then scan the sorted list for output, which is faster and avoids the second reading of data. And turn random IO into sequential IO, but it will use more space because it saves every line in memory.
Overall, the efficiency is better than the dual-channel sorting algorithm.
But there is a problem with the single-channel sorting algorithm:SortBuffer
The buffer is too small, causing all columns to be read from disk to be saved completelySortBuffer
In the buffer, there will be problems with the single-channel multiplexing algorithm at this time, and insteadperformanceIt's better to use a dual-channel multiplexing algorithm.
3.2 Optimization ideas
Optimization strategy for single-channel multiplexing algorithm:
- Enlargement
sort_buffer_size
Setting of parameters. - Enlargement
max_length_for_sort_data
Setting of parameters.
Improve the speed of ORDER BY sorting:
-
ORDER BY
When usingSELECT *
It is a taboo, and it is very important to write any fields you look for. The impact here is:- When the query's field size sum is less than
max_length_for_sort_data
And the sorting field is notTEXT|BLOB
When type, a single-channel sorting algorithm will be used, otherwise a multiple-channel sorting algorithm will be used. - Both sorting algorithms may exceed the data
sort_buffer
After the capacity of the buffer exceeds, it will be created.tmp
Temporary files are merged and sorted, resulting in multiple IOs, but the risk of a single-channel sorting algorithm will be greater, so it needs to be increased.sort_buffer_size
Setting of parameters. - Write whatever fields you look for. If there is an overlay index, even if the sorting order does not match the index order, you cannot directly use the index to sort it, but SQL can also scan the index directly without the need for full table scans, reducing the number of I/O times, thereby improving the sorting speed. (Refer to the SQL version for the specific situation)
- When the query's field size sum is less than
-
Try to improve
sort_buffer_size
: No matter which algorithm is used, improving this parameter will improve efficiency. Of course, it must be improved according to the system's capabilities, because this parameter is for each process. -
Try to improve
max_length_for_sort_data
: Increase this parameter will increase the probability of using a single-channel sorting algorithm. But if the setting is too high, the total data capacitysort_buffer_size
The probability of increasing, and the obvious symptoms are high disk IO activity and low processor usage.