The open metadata type system¶
Knowledge about data is spread amongst many people and systems. One of the roles of a metadata repository is to provide a place where this knowledge can be collected and correlated, as automated as possible. To enable different tools and processes to populate the metadata repository we need agreement on what data should be stored and in what format (structures).
Open metadata subject areas¶
The different subject areas of metadata that we need to support for a wide range of metadata management and governance tasks include:
This metadata may be spread across different metadata repositories that each specialize in particular use cases or communities of users.
|Area 0||describes base types and infrastructure. This includes the root type for all open metadata entities called
|Area 1||collects information from people using the data assets. It includes their use of the assets and their feedback. It also manages crowd-sourced enhancements to the metadata from other areas before it is approved and incorporated into the governance program.|
|Area 2||describes the data assets. These are the data sources, APIs, analytics models, transformation functions and rule implementations that store and manage data. The definitions in Area 2 include connectivity information that is used by the open connector framework (and other tools) to get access to the data assets.|
|Area 3||describes the glossary. This is the definitions of terms and concepts and how they relate to one another. Linking the concepts/terms defined in the glossary to the data assets in Area 2 defines the meaning of the data that is managed by the data assets. This is a key relationship that helps people locate and understand the data assets they are working with.|
|Area 4||defines how the data assets should be governed. This is where the classifications, policies and rules are defined.|
|Area 5||is where standards are established. This includes data models, schema fragments and reference data that are used to assist developers and architects in using best practice data structures and valid values as they develop new capabilities around the data assets.|
|Area 6||provides the additional information that automated metadata discovery engines have discovered about the data assets. This includes profile information, quality scores and suggested classifications.|
|Area 7||provides the structures for recording lineage and providing traceability to the business.|
The following diagram provides more detail of the metadata structures in each area and how they link together:
Metadata is highly interconnected
Bottom left is Area 0 - the foundation of the open metadata types along with the IT infrastructure that digital systems run on such as platforms, servers and network connections. Sitting on the foundation are the assets. The base definition for
Asset is in Area 0 but Area 2 (middle bottom) builds out common types of assets that an organization uses. These assets are hosted and linked to the infrastructure described in Area 0. For example, a data set could be linked to the file system description to show where it is stored.
Area 5 (right middle) focuses on defining the structure of data and the standard sets of values (called reference data). The structure of data is described in schemas and these are linked to the assets that use them.
Many assets have technical names. Area 3 (top middle) captures business and real world terminologies and organizes them into glossaries. The individual terms described can be linked to the technical names and labels given to the assets and the data fields described in their schemas.
Area 6 (bottom right) captures additional metadata captured through automated analysis of data. These analysis results are linked to the assets that hold the data so that data professionals can evaluate the suitability of the data for different purposes. Area 7 (left middle) captures the lineage of assets from a business and technical perspective. Above that in Area 4 are the definitions that control the governance of all of the assets. Finally, Area 1 (top right) captures information about users (people, automated process) their organization, such as teams and projects, and feedback.
Within each area, the definitions are broken down into numbered packages to help identify groups of related elements. The numbering system relates to the area that the elements belong to. For example, area 1 has models 0100-0199, area 2 has models 0200-299, etc. Each area's sub-models are dispersed along its range, ensuring there is space to insert additional models in the future.
Test yourself ...
Fill in the following table to map the areas of the open metadata type system to the different categories of metadata.
|Open Metadata Area||Categories of metadata covered by this Area - choose from Technical metadata, Data content analysis results, Consumer metadata, Subject area materials, Governance metadata, Organizational metadata, Business context metadata, Process metadata and Operational metadata.|
|0 - Base Types, Systems and Infrastructure|
|1 - Collaboration|
|2 - Data Assets|
|3 - Glossary|
|4 - Governance|
|5 - Models and Reference Data|
|6 - Metadata Discovery|
|7 - Lineage|
|Open Metadata Area||Categories of metadata covered by this Area|
|0 - Base Types, Systems and Infrastructure||Technical Metadata|
|1 - Collaboration||Consumer Metadata, Organizational Metadata|
|2 - Data Assets||Technical Metadata|
|3 - Glossary||Subject Area Materials|
|4 - Governance||Subject Area Materials, Governance Metadata, Operational Metadata (associated assets)|
|5 - Models and Reference Data||Technical Metadata, Subject Area Materials|
|6 - Metadata Discovery||Data Content Analysis Results|
|7 - Lineage||Business Context, Process Metadata|