Keywords:
KingbaseES、Large Object、vacuumlo
Large Object Type
KES provides large objectData Type(lo), users can access user data through stream-style access. Streaming access is very useful for big data that is not suitable for overall operation.
KES divides the large object into multiple chunks and stores chunks in rows of the database. KES uses B-tree index to quickly find the correct one when reading and writing a large objectchunk number。
KES also supports TOAST storage system, which automatically stores values larger than one database page to the corresponding secondary storage of each table. The advantage of Large object over TOAST systems is that it allows storage of values up to 4TB, while the maximum allowed by TOAST systems is 1GB. In addition, we can efficiently read and update part of the large object, while most operations on the TOAST value require reading and writing the entire value.
Access to the large object must be performed within a transaction block. You can call setAutoCommit(false) to open a transaction block.
How to use
When creating a table, create an oid column that is used to store large object data. Large object data using lo_create()functionCreate and return its oid. The content of the object can be imported using the lo_import() function and exported using the lo_export() function. Please refer to the following example:
CREATE TABLE image ( name text, raster oid ); SELECT lo_creat(-1); -- returns OID of new, empty large object SELECT lo_create(43213); -- attempts to create large object with OID 43213 SELECT lo_unlink(173454); -- deletes large object with OID 173454 INSERT INTO image (name, raster) VALUES ('beautiful image', lo_import('/etc/motd')); INSERT INTO image (name, raster) VALUES ('beautiful image', lo_import('/etc/motd', 68583)); -- same as above, but specify OID to use SELECT lo_export(, '/tmp/motd') FROM image WHERE name = 'beautiful image'; create table t1 (i int, file_id OID); insert into t1 values (1, lo_import('/home/zn/tmp/')); select * from t1; i | file_id ---+--------- 1 | 16909 (1 row) select lo_export(16909, '/home/zn/tmp/blob_output'); lo_export ----------- 1 (1 row) \lo_list Large objects ID | Owner | Description -------+----------+------------- 16909 | postgres | (1 row) select lo_unlink( 16909 ); lo_unlink ----------- 1 (1 row) \lo_list Large objects ID | Owner | Description ----+-------+------------- (0 rows) select * from t2; i | photo_id ---+---------- 1 | 16909 (1 row) select lo_export(16909, '/home/zn/tmp/blob_output'); ERROR: large object 16909 does not exist |
accomplish
All large objects are saved in a system table named "sys_largeobject". Each large object also has an entry in the system table "sys_largeobject_metadata". Large object can be created, modified, and deleted using file-like APIs.
3.1 sys_catalog.sys_largeobject
// src/include/dbserver/kernel/meta_data/sys_largeobject.h METADATA(_lob,2613,LargeObjectRelationId) { Oid loid;; /* Large object identifier */ int32 pageno;; /* Page number (starting from 0) */ /* The length of the data is variable, but we allow direct access; see inv_api.c */ bytea data BKI_FORCE_NOT_NULL;; /* Page data (may be zero length) */ } FormData_largeobject;; /* data has variable length, but we allow direct access; see inv_api.c */ bytea data BKI_FORCE_NOT_NULL; /* Data for page (may be zero-length) */ } FormData_pg_largeobject; |
The data size of each page is LOBLKSIZE (currently BLCKSZ/4, or 2 kB).
Name |
Type |
References |
Description |
loid |
oid |
sys_largeobject_metadata.oid |
Identifier of the large object that includes this page |
pageno |
int4 |
number of this page within its large object (counting from zero) |
|
data |
bytea |
Actual data stored in the large object. This will never be more than LOBLKSIZE bytes and might be less. |
3.2 sys_catalog.pg_largeobject_metadata
// src/include/dbserver/kernel/meta_data/sys_largeobject_metadata.h METADATA(_lob_meta,2995,LargeObjectMetadataRelationId) { Oid oid;; /* oid */ Oid lomowner;; /* OID of the large object owner */ #ifdef METADATA_VARLEN /* Variable length field starts here */ aclitem lomacl[1];; /* Access permissions */ #endif } FormData_largeobject_metadata;; |
Name |
Type |
References |
Description |
loid |
oid |
Row identifier (hidden attribute; must be explicitly selected) |
|
lomowner |
oid |
sys_authid.oid |
Owner of the large object |
lomacl |
aclitem[] |
Access privileges; see GRANT and REVOKE for details |
LoDescData describes an open large object:
// src/include/dbserver/sdk/datatype/ typedef struct LoDescData { Oid id;; /* LO identifier */ Snapshot snapshot;; /* Snapshot to use */ SubTransactionId subid;; /* Have subtransaction ID */ uint64 offset;; /* Current search pointer */ int flags;; /* See the following marking bits */ /* * bits in the flag: */ #define IFS_READLOCK (1 << 0) /* LO open reading */ #define IFS_WRITELOCK (1 << 1) /* LO Open Writing */ } LoDescData;; /* * Each "page" (tuple) of a large object can hold this much data */ #define LOBLOCKSIZE (BLCKSZ / 4) // 8192 / 4 = 2048 = 2K /* * Maximum length in bytes for a large object. */ #define MAX_LO_SIZE ((int64) INT_MAX * LOBLOCKSIZE) // 2^31 * 2K = 2^31 * 2^11 = 2^42 = 4T |
Interfaces
The following shows the SQL function client uses to operate large object:
zn=# \df lo_* List of functions Schema | Name | Result data type | Argument data types | Type ------------+---------------+------------------+---------------------------+------ sys_catalog | lo_close | integer | integer | func sys_catalog | lo_creat | oid | integer | func sys_catalog | lo_create | oid | oid | func sys_catalog | lo_export | integer | oid, text | func sys_catalog | lo_from_bytea | oid | oid, bytea | func sys_catalog | lo_get | bytea | oid | func sys_catalog | lo_get | bytea | oid, bigint, integer | func sys_catalog | lo_import | oid | text | func sys_catalog | lo_import | oid | text, oid | func sys_catalog | lo_lseek | integer | integer, integer, integer | func sys_catalog | lo_lseek64 | bigint | integer, bigint, integer | func sys_catalog | lo_open | integer | oid, integer | func sys_catalog | lo_put | void | oid, bigint, bytea | func sys_catalog | lo_tell | integer | integer | func sys_catalog | lo_tell64 | bigint | integer | func sys_catalog | lo_truncate | integer | integer, integer | func sys_catalog | lo_truncate64 | integer | integer, bigint | func sys_catalog | lo_unlink | integer | oid | func (18 rows) |
In sys_proc.dat, sql command is associated with back-end function:
# src/include/dbserver/kernel/meta_data/sys_proc.dat { oid => '952', descr => 'large object open', proname => 'lo_open', provolatile => 'v', proparallel => 'u', prorettype => 'int4', proargtypes => 'oid int4', prosrc => 'be_lo_open' }, |
* client side functions
src/dbconnector/libkci/front_lobj.c: lo_create()... |
* server side functions
src/dbserver/kernel/comm/libkci/backend_fsstubs.c: be_lo_create()... |
vacuumlo
Deleting a row of data containing a large object or deleting a table does not delete the large object. To delete the "orphaned" large object, you need to use the vacuumoumlo command.
Implementation principle:
- Query the databaseAll lo in the temporary table vacuum_l
- Query the table containing the lo data type in the database, and return schema, table, and field (column name)
- Remove select field from vacuum_l
- The rest of vacuum_l is orphaned lo, delete it in the system
CREATE TEMP TABLE vacuum_l AS SELECT oid AS lo FROM sys_largeobject_metadata; ANALYZE vacuum_l; SELECT , , FROM sys_class c, sys_attribute a, sys_namespace s, sys_type t WHERE > 0 AND NOT AND = AND = AND = AND in ('oid', 'lo', 'blob', 'clob') AND in ('r', 'm') AND !~ '^sys_'; DELETE FROM vacuum_l WHERE lo IN (SELECT %s FROM %s.%s) -- field, schema, table -- 1000 rows are deleted each time begin; DECLARE myportal CURSOR WITH HOLD FOR SELECT lo FROM vacuum_l; while (1) FETCH FORWARD 1000 IN myportal; -- for each line in result set, call lo_unlink(conn, lo); commit; -- commit once for 1000 deleted lo begin; commit; |
Limitations of vacuumoumlo: External tools need to use username and password to connect to the database, and start them regularly using tools such as cron.