web123456

Quick Start of Elasticsearch Basics (II) Features of Elasticsearch versions

ElasticsearchQuick Start Basics Features of Elasticsearch Versions

Article Directory

  • Quick Start of Elasticsearch Basics Features of Elasticsearch Versions
  • Preface
  • 1. Before Elasticsearch
  • 2. Elasticsearch
  • 3. Elasticsearch
  • 4. Elasticsearch
  • Summarize
  • statement
  • References

Preface

Through the content of the previous section, I believe you have a basic understanding of the concept and main functions of Elasticsearch. This article will take you to explore the version features of Elasticsearch together.


1. Before Elasticsearch

Elasticsearch and Elasticsearch are not very different. The early Elasticsearch versions were very messy and were directly upgraded to 5.0.


2. Elasticsearch

Main features:

  • Based on Lucene

    QueryperformanceIncreased by 25%, and the default scoring mechanism has been changed from TF-IDF to BM 25

  • Internal engine level removes competition locks to avoid concurrent updates of the same document, bringing a 15%-20% performance improvement

  • Provides the first Java native REST client SDK IngestNode

  • Provides Painless scripts instead of Groovy scripts

  • Added Profile API

  • Added Rollover API

  • Added Reindex

  • Introduce a new field type Text/Keyword to replace itString

  • Limit the index request size to avoid overwhelming a large number of concurrent requests

  • Limit the number of shards for a single request, default to 1000

  • Supports non-root users to start only


3. Elasticsearch

Main features:

  • Based on Lucene 7.0


  • Sparseness Doc Values ​​support

    Elasticserach's doc values ​​are stored in columnar values, and the original values ​​of the document are stored in doc values. Sparseness refers to the fact that the structure of a document is actually diverse in an index. Each field in each column has a storage space. If only a few documents have many fields, it may lead to huge waste of disk space.

    Doc Values ​​optimization solves this problem, not only reducing disk space usage, but also reducing merge time and improving query throughput, because file system cache can be better utilized and faster read and write speeds.


  • Index sorting

    That is, sorting in the index stage, that is, when we query, we sometimes sort according to the value of a certain field, such as time, number, etc. If we extract the sorting during indexing, then it will be very fast when searching or aggregating, and we can just go directly to the pre-sorted index. Of course, there will be additional overhead when indexing, which is suitable for scenarios where indexes do not change much.


  • Sequence number support

    Each es operation has a sequential number, which belongs to a function within es, which can provide: fast sharded replica recovery or synchronization; node recovery across data centers; even a Changes API, etc.


  • Seamless scrolling upgrade

    Enable it to rolling up from the last version of 5 to the last version of 6 without the need for a full restart of the cluster. Seamless rolling upgrade, that is, no need to stop service and upgrade online.


  • Removal of types

    In 6.0, multiple types in one index are no longer supported, and all new indexes will have only one virtual fixed type:_docInstead, the parent-child relationship based on type will be implemented through a separate join field, and the type will be completely removed in 7.0.


  • Index-template inheritance

    Inheritance of the index version, current indextemplateAll matches will be merged, which will cause some conflicting problems in the index template. 6.0 will only match one, and it will also be verified when the index is created.


  • Load aware shard routing

    Load-based request routing, the current search request is full-node polling, so the slowest nodes will often cause an overall increase in latency. The new implementation method will automatically adjust the queue length based on the time-consuming queue of the queue. The queue length of the high-load nodes will be reduced, allowing other nodes to share more pressure. Both search and indexing will be based on this mechanism.

    The closed index will also support the automatic processing of replica to ensure the data is reliable.


  • Search across multiple Elasticsearch clusters (CCR)
    As before, Elasticsearch 6.0 can read Indices created in , but cannot read Indices created in . The difference is that now you don't have to re-index all old Indices, you can choose to keep them in the cluster and use cross-cluster search to search on both the .


4. Elasticsearch

Main features:

  • Based on Lucene 8.0

    Elasticsearch 7 is based on 8.0.0. In terms of index compatibility, it can directly load indexes created by Elasticsearch versions 6.0 or above. The indexes created by Elasticsearch require reindex to Elasticsearch.


  • TOP-K Optimization

    Lucene 8.0.0 has made a lot of new energy optimizations, and the main highlight is TOP-K query optimization. In previous versions, queries calculated all hit documents, but users often query words such as ‘a’, ‘the’, etc., which do not add much document score, but forces the query process to score a large number of documents.

    Therefore, if the search results only need to return the results of TOP-K, rather than the range-accurate number of hits, this can be optimized, and the WAND algorithm was introduced in Lucene 8 to achieve this feature. The optimization does not take effect when the search results are less than the specified total number of results. Query QPS is greatly improved after stopping calculating the total number of hit documents


  • Cluster connection changes
    TransportClient is abandoned, so that es7's java code can only use RestClient


  • Changes in ES data storage structure
    The support for multiple types under a single index was officially abolished. When es6 was es7, the official mentioned that es7 would delete the type, and when es6, it stipulated that each index could only have one type. Use the default in es7_docAs a type, the official said that the type will be completely removed in the version. Api request method also sends changes, such as obtaining a document with a certain ID of a certain index: GET index/_doc/id, where index and id are specific values


  • High-level REST client Change

    The API method that accepts the header parameter has been deleted; the Cluster Health API defaults to the cluster level


  • ES package is packaged by default jdk
    So much so that the package size of the version suddenly compares with 300MB+ and it is found that the package is 200MB+, which is exactly the size of JDK.


  • Default configuration changes
    The default node name is the host name, and the default number of shards is changed to 1, no longer 5, avoid Over Sharding


Summarize

The above is the main content of this article. This article briefly introduces the features of each version of Elasticsearch. I believe that you have already reached the key points. For more details of the version feature, please refer to the official website documentation.


statement

All the above contents are from the Internet. If there are any errors, please include them more.


References

Elasticsearch: Authoritative Guide

Breaking changes in 5.0

Breaking changes in 6.3

Breaking changes in 6.0

Breaking changes in 7.9

Breaking changes in 7.8

Breaking changes in 7.7

Breaking changes in 7.6

Breaking changes in 7.5

Breaking changes in 7.4

Breaking changes in 7.3

Breaking changes in 7.2

Breaking changes in 7.1

Breaking changes in 7.0

Elasticsearch version compatibility