In KimballDimensionModeledData WarehouseIn, about multidimensionalArchitecture(MD) There are three key concepts:Bus Architecture,Conformed Dimension,Conformed Fact。
busArchitecture
Multidimensional architecture (bus architecture) In the field of data warehouses, there is an architecture for building data warehouses called Multidimensional Architecture (MD), which is generally in ChinesetranslateIt is a "multi-dimensional architecture", also known as "bus architecture".
The founder of multidimensional architecture is the most famous in the data warehouse fieldPractical experienceDr. Kimball.
The multi-dimensional architecture mainly includes two parts: Back Room and Front Room. The backend is also called the Staging Area, which is the most core component of the MD architecture. In the background, it is the place for the generation, storage and distribution of consistency dimensions, and at the same time, the proxy key is also generated in the background. The front desk is an external interface of the MD architecture, including two mainDatasetCity, one isAtomic Data Mart, the other one isGathering data marts。
The atomic data mart stores the lowest granularity detailed data, and the data is stored in a star structure. The granularity of aggregation data marts is usually higher than that of atomic data marts. Like atomic data marts, aggregation data marts also store data in a star structure.
The front desk also includes services such as query management, activity monitoring, etc. to provide the performance and quality of data warehouses.
In a multidimensional architecture, all these data marts built on star structures can exist physically in one database instance or can be scattered on different machines, and all of these data marts are composed of a distributed data warehouse.
Consistency dimension
In a multidimensional architecture, there is no physical data warehouse, which is composed of physical data marts into logical data warehouses. Moreover, the establishment of a data mart can be completed step by step and finally combined to become a data warehouse.
If there is a problem with the process of establishing a data mart step by step, the data mart will become an isolated market and cannot be combined into a data warehouse, and the consistency dimension is proposed to solve this problem.
The scope of consistency dimensions is the dimension in the bus architecture, that is, the dimensions that may exist in multiple data marts. The selection of this scope requiresArchitectLet's decide.
The content of the consistency dimension is not essentially different from the ordinary dimension, and are both the result of data cleaning and integration. The location where the consistency dimension is established is the Back Room of the multi-dimensional architecture, that is, the data preparation area.
A special dimension designer is required within the data warehouse project team of a multi-dimensional architecture, and his responsibility is to establish and maintain the consistency of dimensions. The dimensions established in the background are synchronously copied to each data mart.
In this way, all data marts have exactly the same dimensions. When establishing a new data mart, consistency dimension processing needs to be performed in the background, and the consistency dimension is decided according to the situation, and then the consistency dimension is copied to each data mart synchronously.
This is the key point for the consistency of different data mart dimensions. In the same market, the consistency dimension means that if two dimensions are related, they are either exactly the same, or one dimension is a mathematical subset of another dimension.
For example, if you create a monthly dimension, the various descriptions of the monthly dimension must be exactly the same as in the date dimension. The most common method is to create a view on the date dimension to generate a monthly dimension. In this way, the monthly dimension can be a subset of the date dimension and can be consistent during subsequent drilling and other operations. If the amount of data in the dimension table is large, for efficiency reasons, a materialized view or actual physical table should be established. In this way, once the dimensions are consistent, the facts can be saved in each data mart. Although it is physically independent, it logically connects all data marts by consistency dimensions, and cross-exploration and other operations can be performed at any time, thus forming a data warehouse.
Consistent facts
When establishing multiple data marts, completing the consistency dimensions has completed 80%-90% of the workload of consistency. The rest of the work is to establish consistent facts. Consistency facts and consistency dimensions are somewhat different. Consistency dimensions are maintained by a dedicated person in the backstage (Back Room). When modifications occur, they are copied to each data mart simultaneously. The fact tables are generally not copied between multiple data marts. When it is necessary to query the facts in multiple data marts, it is generally achieved through cross-exploration (drill across).
In order to be able to cross-explorate between multiple data marts, the consistency facts mainly need to ensure two points. The first is that the definition and calculation method of KPI must be consistent, and the second is that the units of fact must be consistent. If the business requirements or facts cannot be consistent, it is recommended that the facts of different units be established separately to save fields.
In this way, the consistency dimension combines multiple data marts, and the consistency facts ensure that the fact data between different data marts can be cross-explored, and a distributed data warehouse is built.