Schema¶
A schema describes the structure of the data associated with an Asset. The technology that supports the asset often limits the structural choices for data. For example:
- A relational database organizes data into collections of tables and columns.
- Technologies such as JSON or XML, organizes data into nested structures.
- Graph databases organizes data in nodes and relationships.
These differences need to be represented in the Open Metadata Types. However, at the same time, data governance is concerned with the accuracy and appropriate use of individual data values. This is very expensive if each data item was governed individually so the data governance practices aim to group like data together, so they can be governed in a consistent way. As such, the open metadata types provide a root set of types that all the specific schema structures inherit from.
Schema Elements¶
In open metadata, a schema is described using linked subgraph of Schema Element. A schema begins with a schema element called a Schema Type. The data fields described by the schema are represented by Schema Attributes (think of this as a variable) with its own schema type. This schema type describes the structure of the data associated with the schema attribute.
In the early versions of Egeria, the schema attribute and the schema type were represented as two separate entities in the open metadata types with a SchemaTypeForAttribute relationship to connect them together. This is shown in figure 1.
Figure 1: Original model for SchemaAttribute and its SchemaType
However, it became obvious that since these two elements need to retrieved together, it is much more efficient if the schema type is represented as a classification for the SchemaAttribute since classifications are typically stored, distributed and retrieved with their entity. The new classification is called TypeEmbeddedAttribute, and it contains all the properties found in the schema types plus a typeName property to identify the corresponding schema type.
Figure 2 shows the new types for representing a schema attribute and its type.
Figure 2: Collapsing SchemaAttribute and SchemaType into an entity with a classification
Schema type entities are still used:
- to connect Assets and Ports to their schemas
- to connect structural schema types such as maps and external schemas to other types that represent their contents.
Figure 3 shows the use of the schema type:
Figure 3: The SchemaType is still used as the top level element in a schema and for complex structures
Specific Schema Types¶
The RootSchemaType and SchemaAttribute are specialized to support different structures. The diagrams show how the structure is represented for a SchemaAttribute on the left and how it is represented as a SchemaType on the right.
Primitives¶
Primitives are single values such a string, characters and numbers. They are represented by the PrimitiveSchemaType.
Figure 4: The PrimitiveSchemaType
Literals (Constants)¶
Literals are fixed values, also known as constants. They are represented by the LiteralSchemaType.
Figure 5: The LiteralSchemaType
Enumerations¶
Enumerations (Enums) define a list of valid values. The valid values are recorded in a ValidValuesSet linked to an EnumSchemaType.
Figure 6: The EnumSchemaType
Linking to a standard schema type¶
External schema types link to a schema type that is reused in multiple assets - typically it is part of a standard. The use of an external schema type is represented by an ExternalSchemaType.
Figure 7: The ExternalSchemaType
Maps¶
Maps show how one set of values link to another. They are often used for look up tables. The map is represented by a MapSchemaType that then links to two other SchemaTypes, one for the type of the starting value and the other for the type of value it is mapped to.
Figure 8: The MapSchemaType
Alternative types¶
In some schemas, it is possible that there are multiple choices for an element's type. This is supported by the SchemaTypeChoice. This links to the options for the SchemaType.
Figure 9: The SchemaTypeChoice
Structures or Records¶
It is common for an attribute to consist of a collection of other values. For example an attribute called employee may consist of multiple values from employee number, name, address, department, ...
These types of attribute are represented by the StructSchemaType.
Figure 10: The StructSchemaType
The relationship between the schema attribute and its nested schema attributes is NestedSchemaAttribute. The relationship between the StructSchemaType and its nested schema attributes is AttributeForSchema.
Related Information¶
Data classes provide the ability to define logical data types to complement the schema elements.
Open Metadata Types¶
Open Metadata types for connecting schemas to other types of elements:
- 0503 Asset Schema - for the relationship between an Asset and its top level SchemaType.
- 0520 Process Schemas - showing how a schema type can be attached to a process port.
Open Metadata Types for different types of data structures:
- 0501 Schema Elements - for SchemaElement, SchemaType, PrimitiveSchemaType, LiteralSchemaType, EnumSchemaType and SchemaTypeChoice.
- 0505 Schema Attributes - for SchemaAttribute, ComplexSchemaType, StructSchemaType.
- 0507 External Schema Types - for ExternalSchemaType.
- 0511 Map Schema Element - for MapSchemaType.
- 0512 Derived Schema Elements for DerivedSchemaTypeQueryTarget
Specializations of the main types of schema structures for particular types of technology. They enable retrieval of technology-specific schema elements. For example, a query for relational columns with a particular characteristic.
- 0530 Tabular Schema - for TabularSchemaType and TabularColumn.
- 0531 Document Schemas - for DocumentSchemaType and DocumentSchemaAttribute.
- 0532 Object Schemas - for ObjectSchemaType and ObjectAttribute.
- 0533 Graph Schema - for types associated with graph stores.
- 0534 Relational Schema for types associated with relational data
- 0535 Event Schema - for EventTypeList, EventType and EventSchemaAttribute.
- 0536 API Schemas - for types associated with APIs.
Open Metadata and Governance APIs¶
APIs that support the definition of schemas:
- Asset Owner OMAS
- Asset Manager OMAS
- Data Manager OMAS
- Catalog Integrator OMIS
- Database Integrator OMIS
- Files Integrator OMIS
- API Integrator OMIS
- Topic Integrator OMIS
- Governance Action OMES
Other types of information associated with an Asset:
Raise an issue or comment below