web123456

【Flink】State backend (applicable for version 1.13 and versions greater than version 1.13)

FlinkWhy is it becoming more and more popular? Computing fault tolerance is very important in the field of big data. One of the methods of Flink computing fault tolerance is the checkpoint mechanism, which will save data in CheckPoint, and the data will be persisted with CheckPoint to prevent data loss and ensure consistency during recovery. So where is CheckPoint stored? The answer is in the state backend.

Some changes have occurred in the backend of the Flink state after version 1.13 and the new version after version 1.13

The new version of Flink status backend is as follows:

  • HashMapStateBackend
  • EmbeddedRocksDBStateBackend

 HashMapStateBackend:(Data is stored in the heap as Java objects. State and windows in the form of Key/valueOperatorIt will hold a hash table, which stores the status value and trigger. Suitable for jobs with larger states, longer windows and larger key/value states)

(1) HashMapStateBackend only stores memory, which is equivalent to the previous version of MemoryStateBackend. This situation applies to development and usage methods

StreamExecutionEnvironment env = ();
(new HashMapStateBackend());
().setCheckpointStorage(new JobManagerCheckpointStorage());

(2) HashMapStateBackend puts the state into the file system, which is used as fault tolerance and can be used in the production environment. The red part can be "hdfs://namenode:40010/flink/checkpoints" or "file:///data/flink/checkpoints".

StreamExecutionEnvironment env = ();
(new HashMapStateBackend());
().setCheckpointStorage("file:///checkpoint-dir");

EmbeddedRocksDBStateBackend:(Save running status data inRocksDBIn the database, the RocksDB database stores data in the TaskManager data directory by default. It is suitable for jobs with very large states, very long windows, and very large key/value states, and can be used in production environments)

How to use

1. Introducemavenrely

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
    <version>${}</version>
    <scope>provided</scope>
</dependency>

2. Use EmbeddedRocksDBStateBackend, the red part can be "hdfs://namenode:40010/flink/checkpoints" or "file:///data/flink/checkpoints".

StreamExecutionEnvironment env = ();
(new EmbeddedRocksDBStateBackend());
().setCheckpointStorage("file:///checkpoint-dir");