Building Infinitely Scalable Data Storage Capabilities with Eav Models

give an example

Let's say we want to do an e-commerce merchandise management, and we start by selling some clothes, and we need to manage the clothes'fitting (of apparel)、color、eleganceWaiting for information, one day need to sell the computer, the computer needs to bemotherboard (computer) (lit. lord board)、CPU、display card (computer)、random access memory (RAM)、hard disk、radiator Waiting for information, need to sell my cell phone again in a few days, the phone hascolor、releases、storage capacity、Package TypeWait for information on how the database is designed.

Option 1: New fields

Every time you add a new product, you keep adding fields if you need to support different information.

ID	Name	fitting (of apparel)	color	elegance	motherboard (computer) (lit. lord board)	CPU	display card (computer)	random access memory (RAM)	stockpile	radiator	releases	Package Type
1	T-shirt	M	fig. reactionary	Pleasant Goat	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL
2	alien computer	NULL	NULL	NULL	A	i99	RTX8090	32G	2T	water cooling	NULL	Keyboard and Mouse Sets
3	Banana cell phone	NULL	Colorful Black	NULL	NULL	Xiaolong999	NULL	12G	512G	fan (loanword)	Pro Plus MAX	Shattered Screen Insurance Package

This causes the following problems:

It is costly to implement, and every time you add a product, you need to do front-end and back-end development and debugging, wasting time and labor.
Requires dynamic changes to the table structure, which is too insecure in a production environment.
Waste of resources, the database may have more and more fields, and many of the fields are not needed for most of the commodities, and need to be set up asNULL, resulting in a large waste of memory.
Expansion is average and capped.

Option 2: Reserved fields

Define a few reserved fields for the corresponding table, and then these reserved fields can be reused for different products. This will solve part of the development problem, most cases do not need to develop, directly reuse existing fields can be.

ID	Name	Ext1	Ext2	Ext3	Ext4	Ext5	Ext6	Ext7
1	T-shirt	M	fig. reactionary	Pleasant Goat	NULL	NULL	NULL	NULL
2	alien computer	A	i99	RTX8090	32G	2T	water cooling	Keyboard and Mouse Sets
3	Banana cell phone	Colorful Black	Xiaolong999	12G	512G	fan (loanword)	Pro Plus MAX	Shattered Screen Insurance Package

But there are a lot of problems with that:

The fields are the same, but the meanings are different, requiring a lot of front-end adaptation.
The types of fields may be different, and reserved fields also have to be considered differently.
Too few reserved fields are of limited use, and too many have the same storage and performance issues as new fields.
Extended fields are public and can't be named according to the field name, you have to maintain the correspondence when you enable it and look up the correspondence when you use it.
The number of extension fields cannot be precisely defined.

Option 3: JSON_OBJECT

A simple solution in this case is to design an extra field, of type JSON, and put the size, color, shade, motherboard, CPU, graphics card, memory, etc. into JSON.

ID	Name	Ext1
1	T-shirt	`{ "size": "M", "color": "white", "style": "Pleasant Goat" }`
2	alien computer	`{ "motherboard": "A", "cpu": "i9 11900", "graphics": "rtx8090", "memory": "32G", "storage": "2T", "cooling": "water-cooled", "package_type": "keypad_kit" }`
3	Banana cell phone	`{ "color": "Colorful Black", "CPU": "Snapdragon 999", "version": "Pro Plus MAX", "memory": "12G", "storage": "512G", "cooling": "fan", "package type": "Shattered Screen Insurance Package" }`

But JSON has several problems:

JSON data can only be used for display, if used for conditional query, data update its efficiency is very low. When querying, you need to traverse the table to parse the JSON.
Although MySQL supports JSON type, MySQL, as a relational database, has better support for standardized column-per-value, including data type restriction, length restriction, unique index restriction, query index optimization, foreign key associations, correlation query support, and arithmetic support, etc., which cannot be achieved by Key in JSON.
The commonly used query fields from the JSON data to form a separate field, although it can improve the query problem, but you need to have the foresight, if the later peeling will involve code modification and data migration, encountered in multiple versions, there may also be data redundancy problems, poorly handled data inconsistency problems, and not just so simple, must be used with caution.
Poor parsing performance for large JSON.
Each piece of data needs to save Key and Value at the same time, and for Chinese data, pure JSON takes up too much space. Based on the above reasons, it is not recommended to use JSON to store data in some complex cases. The above three are not ideal solutions, and after continuous experience, we have come up with the EAV model, which can solve the above problems to a certain extent.

What is an EAV model?

Entity-Attribute-Value (wiki) is a type ofdata model, which is used to encode entities in a space-saving manner.

Entity: an entity that represents a business object, such as a commodity in the example above.
Attribute: an attribute of an object. Attributes are not stored as a separate column of the entity, but are stored in a separate set of database tables.
Value: refers to the value associated with the particular property.

A few concepts

Sparseness of Attributes (Sparseness of Attributes)

In mathematics and computer science, if an object contains only a few properties out of a large number of potential properties, called "sparse matrix (math)". In the discussion of EAV, "sparse" is used to describe most of the attributes that have no value. For example, the motherboard, CPU, graphics card, etc. attributes in the above section.

Row Modeling

Tables based on the row model have multiple rows of data records describing the entity, with each new set of data stored in the database as additional rows rather than additional columns. The row model is the standard for database designdata modelingtechnology, it is only applicable if the following two conditions are met:

Entity-specific data is sparse.
Data is constantly changing.

The row model is not applicable to the case of sparse and non-volatile data, in this case, the traditional column model should be used.

EAV with line model

The EAV model is a generalization of the row model. The data in a table of the row model is homogeneous, which means that all types of data are stored in one table for the entire database; moreover, the data types of the value columns in a table of the row model are predetermined. In an EAV table, the data type of a particular row and its value is determined by the corresponding attribute.

Choosing the best method for data modeling is difficult, but as a guideline, e.g., use an EAV model rather than a row model when the following conditions are met:

Individual attributes in a data record have different data types; it is difficult to store different types of values in a table when using the row model.
There is a need to represent multiple types of data, the number of which may fluctuate. At the same time, there is very little data of each type, if not sparse attributes. In this case, a traditional data model would have hundreds of tables with only a few rows of data.
In certain environments, whose attributes must be created dynamically, certain classes are often omitted in subsequent cycles of the prototype.
Some entities have a mixture of attributes, which means that some attributes are non-sparse while others are highly sparse. In this case, the non-sparse attributes are stored in traditional tables, while the sparse attributes are stored in EAV or row modeling format.

Table Structure Design

Option 1: Row Model

Let's start with a simple scenario using row modeling. The product is stored as an entity in the Goods table, and the rest of the attributes and values are stored in the Attribute table of the product.

Goods

GoodsID	Name
1	T-shirt
2	alien computer
3	Banana cell phone

Commodity Attribute Table (Attribute)

GoodsID	Attribute	Value
1	fitting (of apparel)	M
1	color	fig. reactionary
1	elegance	Pleasant Goat
2	motherboard (computer) (lit. lord board)	A
2	CPU	i99
2	display card (computer)	RTX8090
2	random access memory (RAM)	32G
2	stockpile	2T
2	radiator	water cooling
2	Package Type	Keyboard and Mouse Sets
3	color	Colorful Black
3	CPU	Snapdragon 999
3	random access memory (RAM)	12G
3	stockpile	512G
3	radiator	fan (loanword)
3	releases	Pro Plus MAX
3	Package Type	Shattered Screen Insurance Package

The above scheme is the row model scheme, by which we can extend the attributes of the goods infinitely. This scheme is more suitable for the case where the attributes of each entity are not fixed.

But there are several problems with this program:

Each piece of data needs to save Key and Value at the same time, for Chinese data, Key is limited and rows are unlimited, it will waste a lot of space and the performance is also poor.
Value has no type restriction, it is all VARCHAR, which is not database friendly, leads to memory waste, and requires data format conversion for access. Indexes created on values stored as strings do not allow search range optimization for numeric and date types, which is a common problem with key-value pairs describing data with mixed data types.

Option 2: Simple EAV

Optimization for the first problem of Scenario 1.

The commodity table remains unchanged. The commodity attribute table is split into an attribute table and a value table.

Attribute

AttributeID	Name
1	color
2	fitting (of apparel)
3	elegance
4	motherboard (computer) (lit. lord board)
5	CPU
6	display card (computer)
7	random access memory (RAM)
8	stockpile
9	radiator
10	releases
11	Package Type

Table of values (Value)

GoodsID	AttributeID	Value
1	2	M
1	1	fig. reactionary
1	3	Pleasant Goat
2	4	A
2	5	i99
2	6	RTX8090
2	7	32G
2	8	2T
2	9	water cooling
2	11	Keyboard and Mouse Sets
3	1	Colorful Black
3	5	Snapdragon 999
3	7	12G
3	8	512G
3	9	fan (loanword)
3	10	Pro Plus MAX
3	11	Shattered Screen Insurance Package

This way the attributes are kept in the attribute table, and each attribute value is kept in the value table, and associations are made with both the product table and the attribute table, which saves a lot of memory.

Option 3: Optimize EAV

In order to solve the second problem of solution 1, the table of values needs to be partitioned based on data types, with each different data type split into a separate table, and the types added through the Attribute table to determine where to access the data.

Attribute

AttributeID	Name	Type
1	color	VARCHAR
2	fitting (of apparel)	INT
3	elegance	INT
4	motherboard (computer) (lit. lord board)	VARCHAR
5	CPU	INT
6	display card (computer)	INT
7	random access memory (RAM)	INT
8	stockpile	INT
9	radiator	VARCHAR
10	releases	TEXT
11	Package Type	VARCHAR

a gauge

eav_int

GoodsID	AttributeID	Value(INT)
1	2	2(M)
1	3	1 (Pleasant Goat)
2	5	99(i99)
2	6	8090(RTX8090)
2	7	32(32G)
2	8	2048(2T)
3	5	999 (Snapdragon 999)
3	7	12(12G)
3	8	512(512G)

eav_varchar

GoodsID	AttributeID	Value(VARCHAR)
1	1	fig. reactionary
2	4	A
3	1	Colorful Black
3	9	fan (loanword)
3	11	Shattered Screen Insurance Package
2	11	Keyboard and Mouse Sets

eav_text

GoodsID	AttributeID	Value(TEXT)
3	10	Pro Plus MAX

This is the more standard EAV model. The two problems of option 1 are solved.

Advantages/disadvantages of EAV

vantage

The main advantage of the EAV model is its flexibility. There is no limit to the number of attribute description tables, which means that the data structure does not need to be redesigned each time an attribute is added; when expanding the database, the number of attributes can be increased vertically without changing the data structure.

EAV only deals with non-null attributes and does not need to reserve additional storage space for null values. This makes the EAV model quite space efficient.

The physical data format is very clean, similar to JSON/XML, and it is easy to map the data to JSON/XML format.

The EAV model can scale applications extremely well and quickly because it prevents the consequences of constant changes (in attributes). New data of any structure can simply be recorded without modifying any data structure.

drawbacks

When considering EAV, it is important to determine whether the data is sparse and the amount of data is important because the complexity of the EAV design outweighs the benefits when using inappropriate data sets. Traditional table structures are more appropriate for relatively static or simple data.

One of the major drawbacks of EAV compared to traditional data structures is that it is less efficient in retrieving large volumes of data. In the EAV model, the data is more dispersed, so querying the records of a complete entity requires multiple table JOIN queries. More importantly, when the EAV model is applied to large data volumes, matrix pivoting (row-column transformation) needs to be performed either briefly or permanently for the same set of data descriptions modeled by EAV. This operation is error-prone and CPU-intensive.

Another limitation of the EAV model requires the development of additional logic to accomplish transactions that are automated under traditional data structures. However, the cost of such work can be reduced by utilizing existing EAV tools.

Finally, understanding the EAV model does take time. It has a definite learning curve, which makes it so that junior developers need to put more effort into it before they really understand its concepts.

reach a verdict

The following conditions should be considered when applying the EAV model:

Data is sparse and heterogeneous, with a wide range of attributes for an entity, and new attributes are often introduced.
The amount of data is very large, with many different types of data, even if the attributes are non-sparse.
There are many hybrid attributes that have both sparse and non-sparse attributes. Typically, not all data classes satisfy the requirements for EAV modeling.

In practice, in production environments, mixed schema is often used, including traditional relationships, EAV, or a suitable hybrid approach.

Magento 2 Database EAV Model Structure

Magento is an open source e-commerce platform written in the PHP programming language; it is mainly oriented to enterprise-level applications that can handle all aspects of the needs of the world's number one e-commerce system. According to statistics, the global e-commerce site built using magento accounted for up to 31.4%!

Table

eav_entity: (E) Table of entities.
eav_attribute: (A) Attribute table.
eav_entity_{type}: (V) Table of values.
eav_entity_attribute: Attributes of all entities.
eav_entity_type: Type Entity Table, which can support multiple entities, so you need to distinguish between entity types.
eav_attribute_set: A collection of attributes that can aggregate some attributes of a kind of entity.

Table of values type

int: The int type corresponding to the mysql field.
varchar: corresponds to the varchar type of the mysql field.
text: corresponds to the text type of the mysql field.
decimal: corresponds to the decimal type of the mysql field.
datetime: Corresponds to the datetime type of the mysql field.

Some field meanings

entity_id: A unique identifier for the product.
attribute_id: It is the unique identifier for each attribute and the primary key for this table.
value: The value of the attribute.
value_id: Unique identifier and primary key for the table of values.
entity_type_id: The Entity table supports different entity types such as Merchandise, Customer, Order, etc., which are distinguished by this field.
attribute_set_id: Product attributes can be formed locally into different attribute sets. Attribute sets allow for greater flexibility in the product structure because products do not have to use all the attributes that have been defined.
type_id: Product types, each with different settings and features .
sku: A Stock Keeping Unit (SKU) can be a string of numbers or a combination of letters that uniquely identifies a product sold on a website and is defined by the user.
has_options: Used to identify whether the product has customization options (such as drop-down lists, text boxes, etc.).
required_options: Used to identify whether a custom item is required or mandatory.
entity_type_id: This item associates the attribute with a specific EAV model type.
attribute_code: This item is the name or key name of the attribute used to generate the getter and setter methods of the magic method.
backend_model: The backend model is used for the management of loading or storing data into the database.
backend_type: This item specifies whether the attribute is not stored in the default EAV table but in a specific table.
frontend_model: The front-end model is responsible for how the attribute element is output on the browser.
frontend_input: Similar to the front-end model, the front-end input specifies the type of item to be entered on the browser.
frontend_label: This item is the label or name that the property outputs on the browser.
source_model: Source models are used to pass possible values to an attribute, Magento comes with a number of predefined source models such as country, yes or no values, region, and so on.

consultation

Understanding the EAV data model and when to use it
Entity-Attribute-Value (EAV) model | Alan Hou's personal blog
Several implementations of custom fields