Skip to content

Open Metadata Archives

An open metadata archive is a portable collection of open metadata type definitions and instances. It can be loaded each time a metadata access server starts up or added to a running metadata access server.

The open metadata archive has three types that are used to signal how the archive is intended to be used:

  • A content pack contains standard metadata that is generally useful. It may come from the Egeria community or third parties. It can be loaded into many repositories, whether these repositories are connected or not via cohorts. This is a useful way to distribute open metadata types or definitions for a standard.
  • A metadata export contains a collection of metadata elements that have been extracted from a specific open metadata repository to load into another. It is used to transfer metadata between repositories that are not connected via cohorts.
  • A repository backup contains a collection of metadata elements that is intended to act as a backup for a server. It typically contains metadata instances from the server's local metadata collection. This archive is expected to be loaded back into the same repository. This can be done at any time. If the repository contains more recent content, the older content in the archive is ignored.

Raise an issue or comment below

By the rules of metadata provenance, the elements in an open metadata archive are read-only when loaded into an open metadata repository unless the repository has the same metadata collection id as the element.

Figure 1 shows a content pack being loaded into a server. When an element from an open metadata archive is loaded, it is compared against the content of the local repository. If it is a new element, or a later version than the local repository has, the element is stored and then distributed around to any connected cohorts.

Figure 1

Figure 1: Loading a content pack

Notice that due to the distribution of this metadata across the cohorts, it is only necessary to load the archive into one of the servers.

When data and other types of assets are being transported between organizations, it is possible to use a metadata export open metadata archive to pass the related metadata as well. This is shown in figure 2.

Figure 2

Figure 2: Exporting and reimporting metadata between unconnected repositories

Figure 3 shows a metadata export archive to create a backup of selected metadata. This can be used to recover the metadata repository content after a bad load or other operational error.

Figure 3

Figure 3: Selective back up of metadata elements

Creating open metadata archives

There are two approaches to create an open metadata archive:

  • Assemble the contents in memory and push to the open metadata archive store when the archive is assembled.
  • Push the elements in the archive one-by-one as they are built.

The first approach works well for small archives such as content packs and the second is for large archives such as backups.

There are three supporting components used in the construction process:

  • Helper - logic to build the different types of elements for the archive.
  • Builder - logic to assemble the elements into the archive structure.
  • Writer - logic to store the contents of the archive on disk.

They are driven by specific archive logic that knows what content to add to the archive and an open metadata archive store connector that is responsible for the storage of the archive.

Figure 4

Figure 4: Assembling an open metadata archive in memory and then writing it out to disk once it is complete

Figure 5

Figure 5: Assembling an open metadata archive directly on disk

The archive logic can either be an offline utility or an archive service running in an archive engine.

Inside an Open Metadata Archive

The open metadata archive has three parts to it. This is shown in Figure 4. The header defines the type of archive and its properties. Then there is the type store. This contains new attribute type definitions, new type definitions and updates to type definitions (patches). Finally, there is the instance store. This contains new instances (entities, relationships and classifications).

Figure 6

Figure 6: Inside an Open Metadata Archive

Example of the header from the Cloud Information Model archive
{
  "class":"OpenMetadataArchive",
  "archiveProperties":
      {
          "class":"OpenMetadataArchiveProperties",
          "archiveGUID":"9dc75637-92a7-4926-b47b-a3d407546f89",
          "archiveName":"Cloud Information Model (CIM) glossary and concept model",
          "archiveDescription":"Data types for commerce focused cloud applications.",
          "archiveType":"CONTENT_PACK",
          "originatorName":"The Cloud Information Model",
          "originatorLicense":"Apache-2.0",
          "creationDate":1570383385107,
          "dependsOnArchives":["bce3b0a0-662a-4f87-b8dc-844078a11a6e"]
      }, 
   "archiveTypeStore":{},
   "archiveInstanceStore":{}
}

Storage structures

Figure 7

Figure 7: Storing an open metadata archive as a single file

Figure 8

Figure 8: Storing an open metadata archive in a directory structure

Loading open metadata archives

A metadata server's configuration document can list the archives to load each time the server is started. This is useful if the server does not retain metadata through a server restart (like the in-memory metadata repository). Open metadata archives may also be loaded while the server is running using a REST API call.

These articles describe how to load open metadata archives into a server:

The archive loads in the following order:

  • Attribute Type Definitions (AttributeTypeDefs) from the type store.

  • PrimitiveDefs

  • CollectionDefs
  • EnumDefs

  • New Type Definitions (TypeDefs) from the type store.

  • EntityDefs

  • RelationshipDefs
  • ClassificationDefs

  • Updates to type definitions (TypeDefPatches)

  • New Instances

  • Entities

  • Relationships
  • Classifications

The archive is loaded once and its content is immediately available. If the repository persists metadata over a server restart then this archive content continues to be available after the server restarts.

It does not matter how many times an archive is loaded, only one copy of the content is added to the repository.

Supported utilities for open metadata archives

Egeria supports the following open metadata archives. Associated with each archive are utilities that help you build additional archives of your own content.


Raise an issue or comment below