Skip to content

Schema

A schema describes the structure of the data associated with an Asset. The technology that supports the asset often limits the structural choices for data. For example:

  • A relational database organizes data into collections of tables and columns.
  • Technologies such as JSON or XML, organizes data into nested structures.
  • Graph databases organizes data in nodes and relationships.

These differences need to be represented in the Open Metadata Types. However, at the same time, data governance is concerned with the accuracy and appropriate use of individual data values. This is very expensive if each data item was governed individually so the data governance practices aim to group like data together, so they can be governed in a consistent way. As such, the open metadata types provide a root set of types that all the specific schema structures inherit from.

Schema Elements

In open metadata, a schema is described using linked subgraph of Schema Element. A schema begins with a schema element called a Schema Type. The data fields described by the schema are represented by Schema Attributes (think of this as a variable) with its own schema type. This schema type describes the structure of the data associated with the schema attribute.

In the early versions of Egeria, the schema attribute and the schema type were represented as two separate entities in the open metadata types with a SchemaTypeForAttribute relationship to connect them together. This is shown in figure 1.

Figure 1

Figure 1: Original model for SchemaAttribute and its SchemaType

However, it became obvious that since these two elements need to retrieved together, it is much more efficient if the schema type is represented as a classification for the SchemaAttribute since classifications are typically stored, distributed and retrieved with their entity. The new classification is called TypeEmbeddedAttribute, and it contains all the properties found in the schema types plus a typeName property to identify the corresponding schema type.

Figure 2 shows the new types for representing a schema attribute and its type.

Figure 2

Figure 2: Collapsing SchemaAttribute and SchemaType into an entity with a classification

Schema type entities are still used:

  • to connect Assets and Ports to their schemas
  • to connect structural schema types such as maps and external schemas to other types that represent their contents.

Figure 3 shows the use of the schema type:

Figure 3

Figure 3: The SchemaType is still used as the top level element in a schema and for complex structures

Specific Schema Types

The RootSchemaType and SchemaAttribute are specialized to support different structures. The diagrams show how the structure is represented for a SchemaAttribute on the left and how it is represented as a SchemaType on the right.

Primitives

Primitives are single values such a string, characters and numbers. They are represented by the PrimitiveSchemaType.

Figure 4

Figure 4: The PrimitiveSchemaType

Literals (Constants)

Literals are fixed values, also known as constants. They are represented by the LiteralSchemaType.

Figure 5

Figure 5: The LiteralSchemaType

Enumerations

Enumerations (Enums) define a list of valid values. The valid values are recorded in a ValidValuesSet linked to an EnumSchemaType.

Figure 6

Figure 6: The EnumSchemaType

Linking to a standard schema type

External schema types link to a schema type that is reused in multiple assets - typically it is part of a standard. The use of an external schema type is represented by an ExternalSchemaType.

Figure 7

Figure 7: The ExternalSchemaType

Maps

Maps show how one set of values link to another. They are often used for look up tables. The map is represented by a MapSchemaType that then links to two other SchemaTypes, one for the type of the starting value and the other for the type of value it is mapped to.

Figure 8

Figure 8: The MapSchemaType

Alternative types

In some schemas, it is possible that there are multiple choices for an element's type. This is supported by the SchemaTypeChoice. This links to the options for the SchemaType.

Figure 9

Figure 9: The SchemaTypeChoice

Structures or Records

It is common for an attribute to consist of a collection of other values. For example an attribute called employee may consist of multiple values from employee number, name, address, department, ...

These types of attribute are represented by the StructSchemaType.

Figure 10

Figure 10: The StructSchemaType

The relationship between the schema attribute and its nested schema attributes is NestedSchemaAttribute. The relationship between the StructSchemaType and its nested schema attributes is AttributeForSchema.

Data classes provide the ability to define logical data types to complement the schema elements.

Open Metadata Types

Open Metadata types for connecting schemas to other types of elements:

Open Metadata Types for different types of data structures:

Specializations of the main types of schema structures for particular types of technology. They enable retrieval of technology-specific schema elements. For example, a query for relational columns with a particular characteristic.

Open Metadata and Governance APIs

APIs that support the definition of schemas:

Other types of information associated with an Asset:


Raise an issue or comment below