web123456

ClickHouse FAQs and Solutions

1 Overview

When performing performance tests on ClickHouse with distribution table + copy table + zookeeper, we encountered the following pitfalls and sorted them out.

2 Distribution table join problem Unknown identifier: LO_CUSTKEY, context:…

1.1 Problem Description

SQL is as follows:

SELECT count(1)
FROM performance.line_all AS c 
LEFT JOIN performance.customer_all AS l ON l.C_CUSTKEY = c.LO_CUSTKEY

Execute the SQL error as follows:

Received exception from server (version 19.4.0):
Code: 47. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Received from ambari04:9000, 10.0.0.54. DB::Exception: Unknown identifier: LO_CUSTKEY, context: query: 'LO_CUSTKEY' required_names: 'LO_CUSTKEY' source_tables: table_aliases: complex_aliases: masked_columns: array_join_columns: source_columns: .

According to the error message, you can not know LO_CUSTKEY, this connection field

1.2 Solution

Distribution table join. In the join condition after on, the join field of the table followed by from is placed in front. Modify SQL as follows:

SELECT count(1)
FROM performance.customer_all AS c 
LEFT JOIN performance.line_all AS l ON l.C_CUSTKEY = c.LO_CUSTKEY

2 Unknown status, Cannot allocate block number in ZooKeeper: , ZooKeeper session has been expired…

2.1 Problem Description

If you encounter the following error when executing SQL:

↑ Progress: 157.94 million rows, 6.91 GB (92.63 thousand rows/s., 4.05 MB/s.) Received exception from server (version 19.4.0):
Code: 319. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Unknown status, client must retry. Reason: Connection loss. 
↖ Progress: 94.47 million rows, 4.18 GB (95.07 thousand rows/s., 4.20 MB/s.) Received exception from server (version 19.4.0):
Code: 999. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss. 
lineorder_flat_all.: Code: 225, () = DB::Exception: Received from ambari02:9000, 10.0.0.52. DB::Exception: ZooKeeper session has been expired.. Stack trace:

According to the error message, it can be seen that the connection with Zookeeper is lost, resulting in block numbers being unable to be allocated. Because clickhouse has a heavy dependence on zookeeper, the metadata information of tables, the information of each data block, and the data is synchronized every time they are inserted and data needs to be interacted with zookeeper. During the sync logs process, Zookeerper service will cause ZK to be unable to respond to external requests, which will cause session expiration and other problems.

2.2 Solution

(1) Increase the maximum timeout time of zookeeper session, modify MaxSessionTimeout=120000 in , and restart zookeeper after modification.
Note: Do not set the timeout of zookeeper too much, as it will be very slow if the service is slack.
(2) Zookeeper's snapshot file storage disk shall not be less than 1T, please pay attention to the cleaning strategy.
(3) The dataLogDir storage directory in zookeeper should be separated from dataDir, and a set of storage devices can be used to store ZK logs.
(4) Added in: forceSync=no. The default is on. To avoid synchronization delay problem, ZK will immediately synchronize the current status information to the disk log file after receiving the data, and will only reply after synchronization is completed. After closing this item, the client connection can be quickly responded. After turning off the forceSync option, there will be potential risks. Although the disk will still be flushed (() is executed first), because the operating system will write cache first in order to improve the disk writing efficiency. When the machine is abnormal, some zk status information may not be synchronized to the disk, resulting in different information before and after ZK.
(5) When creating a clickhouse table, add the use_minimalistic_part_header_in_zookeeper parameter to compress and store the metadata, but it cannot be rolled back after modification.

3. Table is in readonly mode

3.1 Problem Description

If SQL encounters the following error when executing insertion data:

2020.05.28 10:59:11.048910 [ 47 ] {} <Error> lineorder_flat_all.: Code: 242, () = DB::Exception: Received from ambari04:9000, 10.0.0.54. DB::Exception: Table is in readonly mode. Stack trace:

It is because the zookeeper pressure is too high and the table is in "read only mode" mode, which causes the insertion to fail.

3.2 Solution

(1) The dataLogDir storage directory in zookeeper should be separated from dataDir, and a set of storage devices can be used to store ZK logs separately.
(2) Do a good job in planning the zookeeper cluster and clickhouse cluster. Multiple zookeeper clusters can serve a set of clickhouse clusters.

4 Clickhouse cluster zookeeper data is lost, Can’t get data for node /clickhouse/tables/…

4.1 Problem Description

If the following error is found in the log

Cannot create table from metadata file /var/lib/clickhouse/metadata/xx/, error: Coordination::Exception: Can’t get data for node /clickhouse/tables/xx/cluster_xxx-01/xxxx/metadata: node doesn’t exist (No node), stack trace:

It is because the zookeeper data is lost, which makes the clickhouse database unable to start.

4.2 Solution

(1) Backup SQL under /var/lib/clickhouse/metadata/ and data under /var/lib/clickhouse/data/ after deletion
(2) Start the database
(3) Create a MergeTree table with the original table data structure
(4) Copy the data folder of the previous distributed table to the data directory of the new table.
(5) Restart the database
(6) Recreate the original structure local table
(7) Recreate the original structure distributed table
(8) insert into [Distributed Table] select * from [MergeTree Table]