Skip to content

Unity Catalog

Unity Catalog is a data manager catalog that governs access to data. It is typically managing data from data lakes and lakehouses where much of the data is in a parquet format, with a table abstraction over the top. Unity catalog also supports folders of files (called Volumes) and functions. Unity catalog is able to provide access to the data in its catalogs, and run the functions.

The picture below shows Unity Catalog managing access to data in Delta Lake.

Unity Lake with Delta Lake

Internally, unity catalog's metadata is organized into catalogs. (So one way to think of Unity Catalog is as a 'catalog of catalogs'.) Each catalog has multiple schemas and these contain the resources:

  • Tables - these are virtual tables, typically backed by an Apache Parquet file.
  • Functions - these are callable functions, typically implemented in SQL, but may be a callable external component.
  • Volumes - these are collections of files.

As a result of this structure, the resources in Unity Catalog have a three level name: catalogName.schemaName.resourceName.

Unity Catalog Technology Type Names

The technology type names (aka deployed implementation types) added to Egeria's reference data for Unity Catalog are:

  • Unity Catalog Server - The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.
  • Unity Catalog Catalog - An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
  • Unity Catalog Schema - A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
  • Unity Catalog Table - A relational table within the Unity Catalog (UC) 'catalog of catalogs'.
  • Unity Catalog Function - A function found in Unity Catalog (UC) that is working with data.
  • Unity Catalog Volume - A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.
JSON output from tech type search for 'Unity Catalog'
{
  "class": "TechnologyTypeSummaryListResponse",
  "relatedHTTPCode": 200,
  "elements": [
    {
      "technologyTypeGUID": "2d89345f-2650-4c04-bd5c-8cdbab7a0b79",
      "qualifiedName": "Egeria:ValidMetadataValue:SoftwareServer:deployedImplementationType-(Unity Catalog Server)",
      "name": "Unity Catalog Server",
      "description": "The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.",
      "category": "SoftwareServer:deployedImplementationType"
    },
    {
      "technologyTypeGUID": "2b28dd27-3d4e-4c75-a3e8-cbbcbe8cb62f",
      "qualifiedName": "Egeria:ValidMetadataValue:Catalog:deployedImplementationType-(Unity Catalog Catalog)",
      "name": "Unity Catalog Catalog",
      "description": "An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
      "category": "Catalog:deployedImplementationType"
    },
    {
      "technologyTypeGUID": "c56ca4d1-ed5a-4b05-b75b-e4b6bd3500ff",
      "qualifiedName": "Egeria:ValidMetadataValue:DeployedDatabaseSchema:deployedImplementationType-(Unity Catalog Schema)",
      "name": "Unity Catalog Schema",
      "description": "A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
      "category": "DeployedDatabaseSchema:deployedImplementationType"
    },
    {
      "technologyTypeGUID": "3a1ad610-f5c5-4aba-a766-63965ac528be",
      "qualifiedName": "Egeria:ValidMetadataValue:VirtualRelationalTable:deployedImplementationType-(Unity Catalog Table)",
      "name": "Unity Catalog Table",
      "description": "A relational table within the Unity Catalog (UC) 'catalog of catalogs'.",
      "category": "VirtualRelationalTable:deployedImplementationType"
    },
    {
      "technologyTypeGUID": "7f15dd5f-7569-4697-a3f1-491e399f4351",
      "qualifiedName": "Egeria:ValidMetadataValue:DeployedAPI:deployedImplementationType-(Unity Catalog Function)",
      "name": "Unity Catalog Function",
      "description": "A function found in Unity Catalog (UC) that is working with data.",
      "category": "DeployedAPI:deployedImplementationType"
    },
    {
      "technologyTypeGUID": "dbabe8cb-345e-4665-a665-1bef56a26ecd",
      "qualifiedName": "Egeria:ValidMetadataValue:DataFolder:deployedImplementationType-(Unity Catalog Volume)",
      "name": "Unity Catalog Volume",
      "description": "A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.",
      "category": "DataFolder:deployedImplementationType"
    }
  ]
}

Open Metadata Type Mapping for Unity Catalog

The mapping from Unity Catalog metadata elements to the Open Metadata Types used in the Open Metadata Ecosystem is as follows:

Technology Type Open Metadata Type
Unity Catalog Server SoftwareServer
Unity Catalog Catalog Catalog
Unity Catalog Schema DeployedDatabaseSchema
Unity Catalog Function DeployedAPI with an associated DeployedSoftwareComponent for its implementation.
Unity Catalog Table VirtualRelationalTable with an associated DataFolder for its files.
Unity Catalog Volume DataFolder

In addition, each of these elements have a PropertyFacet and an External Identifier attached. The property facet contains implementation specific details; the external identifier includes the guid from unity catalog plus other mapping values such as the catalog name, schema name and short name to enable the Unity Catalog connectors to ensure that the name of a element has not changed since the last time a Unity Catalog element was retrieved.

The diagram below illustrates the mapping of the Unity Catalog metadata resource to the Open Metadata Types.

Type Mapping

Below is the field mapping from Unity Catalog to these Open Metadata Types:

Unity Catalog Property Egeria Type and Attribute Supported in OSS Version
id ExternalIdentifier.identifier Yes
name Catalog.name Yes
comment Catalog.description Yes
owner Ownership.owner No
name ExternalIdentifier.mappingProperties.ucCatalogName Yes
created_at ExternalIdentifier.externalInstanceCreationTime Yes
created_by ExternalIdentifier.externalInstanceCreatedBy No
updated_at ExternalIdentifier.externalInstanceLastUpdateTime Yes
updated_by ExternalIdentifier.externalInstanceLastUpdatedBy No
catalog_type PropertyFacet.properties.ucCatalogType No
metastore_id PropertyFacet.properties.ucMetastoreId No
isolation_mode PropertyFacet.properties.ucIsolationMode No
accessible_in_current_workspace PropertyFacet.properties.ucAccessibleInCurrentWorkspace No
browse_only PropertyFacet.properties.ucBrowseOnly No
securable_type PropertyFacet.properties.ucSecurableType No
securable_kind PropertyFacet.properties.ucSecurableKind No
Additional Values Egeria Type and Attribute Supported in OSS Version
Unity Catalog Catalog: Server URL : full_name Catalog.qualifiedName Yes
Unity Catalog Catalog Catalog.deployedImplementationType Yes
Databricks Unity Catalog Catalog: Server URL Catalog.qualifiedName No
Databricks Unity Catalog Catalog Catalog.deployedImplementationType No
Server URL ExternalIdentifier.mappingProperties.serverNetworkAddress Yes

Unity Catalog Property Egeria Type and Attribute Supported in OSS Version
schema_id ExternalIdentifier.identifier Yes
name DeployedDatabaseSchema.name Yes
full_name DeployedDatabaseSchema.resourceName Yes
comment DeployedDatabaseSchema.description Yes
owner Ownership.owner No
created_at ExternalIdentifier.externalInstanceCreationTime Yes
created_by ExternalIdentifier.externalInstanceCreatedBy No
updated_at ExternalIdentifier.externalInstanceLastUpdateTime Yes
updated_by ExternalIdentifier.externalInstanceLastUpdatedBy No
catalog_name ExternalIdentifier.mappingProperties.ucCatalogName Yes
name ExternalIdentifier.mappingProperties.ucSchemaName Yes
catalog_type PropertyFacet.properties.ucCatalogType No
metastore_id PropertyFacet.properties.ucMetastoreId No
browse_only PropertyFacet.properties.ucBrowseOnly No
securable_type PropertyFacet.properties.ucSecurableType No
securable_kind PropertyFacet.properties.ucSecurableKind No
properties PropertyFacet.properties No
Additional Values Egeria Type and Attribute Supported in OSS Version
Unity Catalog Schema: Server URL : full_name DeployedDatabaseSchema.qualifiedName Yes
Unity Catalog Schema DeployedDatabaseSchema.deployedImplementationType Yes
Databricks Unity Catalog Schema: Server URL : full_name DeployedDatabaseSchema.qualifiedName No
Databricks Unity Catalog Schema DeployedDatabaseSchema.deployedImplementationType No
Server URL ExternalIdentifier.mappingProperties.serverNetworkAddress Yes

Unity Catalog Property Egeria Type and Attribute Supported in OSS Version
volume_id ExternalIdentifier.identifier Yes
name FileFolder.name + ExternalIdentifier.mappingProperties.ucVolumeName Yes
full_name FileFolder.resourceName Yes
comment FileFolder.description Yes
storage_location FileFolder.pathName + Endpoint.networkAddress + PropertyFacet.properties.ucStorageLocation Yes
owner Ownership.owner No
created_at ExternalIdentifier.externalInstanceCreationTime Yes
created_by ExternalIdentifier.externalInstanceCreatedBy No
updated_at ExternalIdentifier.externalInstanceLastUpdateTime Yes
updated_by ExternalIdentifier.externalInstanceLastUpdatedBy No
catalog_name ExternalIdentifier.mappingProperties.ucCatalogName Yes
schema_name ExternalIdentifier.mappingProperties.ucSchemaName Yes
volume_type PropertyFacet.properties.ucVolumeType Yes
metastore_id PropertyFacet.properties.ucMetastoreId No
browse_only PropertyFacet.properties.ucBrowseOnly No
securable_type PropertyFacet.properties.ucSecurableType No
securable_kind PropertyFacet.properties.ucSecurableKind No
resource_name PropertyFacet.properties.ucResourceName No
Additional Values Egeria Type and Attribute Supported in OSS Version
Unity Catalog Volume: Server URL : full_name FileFolder.qualifiedName Yes
Unity Catalog Volume FileFolder.deployedImplementationType Yes
Databricks Unity Catalog Volume: Server URL : full_name FileFolder.qualifiedName No
Databricks Unity Catalog Volume FileFolder.deployedImplementationType No
Server URL ExternalIdentifier.mappingProperties.serverNetworkAddress Yes

Unity Catalog Property Egeria Type and Attribute Supported in OSS Version
table_id ExternalIdentifier.identifier Yes
name VirtualRelationalTable.name + ExternalIdentifier.mappingProperties.ucTableName Yes
full_name VirtualRelationalTable.resourceName Yes
comment VirtualRelationalTable.description Yes
owner Ownership.owner No
data_source_format DataAssetEncoding.encoding Yes
columns.name RelationalColumn.displayName Yes
columns.comment RelationalColumn.description Yes
columns.position RelationalColumn.position Yes
columns.nullable RelationalColumn.isNullable Yes
columns.type_precision RelationalColumn.precision Yes
columns.type_scale RelationalColumn.significantDigits Yes
columns.partition_index RelationalColumn.additionalProperties.ucPartitionIndex Yes
columns.type_text TypeEmbeddedAttribute.displayName Yes
columns.type_name TypeEmbeddedAttribute.dataType Yes
columns.type_interval_type TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType Yes
columns.type_json TypeEmbeddedAttribute.additionalProperties.ucTypeJSON Yes
created_at ExternalIdentifier.externalInstanceCreationTime Yes
created_by ExternalIdentifier.externalInstanceCreatedBy No
updated_at ExternalIdentifier.externalInstanceLastUpdateTime Yes
updated_by ExternalIdentifier.externalInstanceLastUpdatedBy No
catalog_name ExternalIdentifier.mappingProperties.ucCatalogName Yes
schema_name ExternalIdentifier.mappingProperties.ucSchemaName Yes
storage_location PropertyFacet.properties.ucStorageLocation Yes
table_type PropertyFacet.properties.ucTableType Yes
metastore_id PropertyFacet.properties.ucMetastoreId No
browse_only PropertyFacet.properties.ucBrowseOnly No
securable_type PropertyFacet.properties.ucSecurableType No
securable_kind PropertyFacet.properties.ucSecurableKind No
resource_name PropertyFacet.properties.ucResourceName No
Additional Values Egeria Type and Attribute Supported in OSS Version
Unity Catalog Table: Server URL : full_name VirtualRelationalTable.qualifiedName Yes
Unity Catalog Table VirtualRelationalTable.deployedImplementationType Yes
Databricks Unity Catalog Table: Server URL : full_name VirtualRelationalTable.qualifiedName No
Databricks Unity Catalog Table VirtualRelationalTable.deployedImplementationType No
Server URL ExternalIdentifier.mappingProperties.serverNetworkAddress Yes

Unity Catalog Property Egeria Type and Attribute Supported in OSS Version
function_id ExternalIdentifier.identifier Yes
name DeployedAPI.name + DeployedSoftwareComponent.name + ExternalIdentifier.mappingProperties.ucFunctionName Yes
full_name DeployedAPI.resourceName Yes
comment DeployedAPI.description Yes
owner Ownership.owner No
input_parameters.parameters.name APIParameter.displayName Yes
input_parameters.parameters.comment APIParameter.description Yes
input_parameters.parameters.position APIParameter.position Yes
input_parameters.parameters.nullable APIParameter.isNullable Yes
input_parameters.parameters.type_precision APIParameter.precision Yes
input_parameters.parameters.type_scale APIParameter.significantDigits Yes
input_parameters.parameters.type_text TypeEmbeddedAttribute.displayName Yes
input_parameters.parameters.type_name TypeEmbeddedAttribute.dataType Yes
input_parameters.parameters.parameter_default TypeEmbeddedAttribute.defaultValue Yes
input_parameters.parameters.type_interval_type TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType Yes
input_parameters.parameters.type_json TypeEmbeddedAttribute.additionalProperties.ucTypeJSON Yes
input_parameters.parameters.parameter_mode APIParameter.additionalProperties.ucParameterMode Yes
input_parameters.parameters.parameter_type APIParameter.parameterType Yes
return_parameters DataAssetEncoding.encoding Yes
return_parameters.parameters.name APIParameter.displayName Yes
return_parameters.parameters.comment APIParameter.description Yes
return_parameters.parameters.position APIParameter.position Yes
return_parameters.parameters.nullable APIParameter.isNullable Yes
return_parameters.parameters.type_precision APIParameter.precision Yes
return_parameters.parameters.type_scale APIParameter.significantDigits Yes
return_parameters.parameters.type_text TypeEmbeddedAttribute.displayName Yes
return_parameters.parameters.type_name TypeEmbeddedAttribute.dataType Yes
return_parameters.parameters.parameter_default TypeEmbeddedAttribute.defaultValue Yes
return_parameters.parameters.type_interval_type TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType Yes
return_parameters.parameters.type_json TypeEmbeddedAttribute.additionalProperties.ucTypeJSON Yes
return_parameters.parameters.parameter_mode APIParameter.additionalProperties.ucParameterMode Yes
return_parameters.parameters.parameter_type APIParameter.parameterType Yes
created_at ExternalIdentifier.externalInstanceCreationTime Yes
created_by ExternalIdentifier.externalInstanceCreatedBy No
updated_at ExternalIdentifier.externalInstanceLastUpdateTime Yes
updated_by ExternalIdentifier.externalInstanceLastUpdatedBy No
catalog_name ExternalIdentifier.mappingProperties.ucCatalogName Yes
schema_name ExternalIdentifier.mappingProperties.ucSchemaName Yes
storage_location PropertyFacet.properties.ucStorageLocation Yes
data_type PropertyFacet.properties.ucFunctionDataType Yes
full_data_type PropertyFacet.properties.ucFunctionFullDataType Yes
routine_parameter_style PropertyFacet.properties.ucRoutineParameterStyle Yes
security_type PropertyFacet.properties.ucSecurityType Yes
specific_name PropertyFacet.properties.ucSpecificName Yes
metastore_id PropertyFacet.properties.ucMetastoreId No
browse_only PropertyFacet.properties.ucBrowseOnly No
securable_type PropertyFacet.properties.ucSecurableType No
securable_kind PropertyFacet.properties.ucSecurableKind No
resource_name PropertyFacet.properties.ucResourceName No
routine_dependencies ProcessCall relationship Yes
routine_definition DeployedSoftwareComponent.description Yes
external_language DeployedSoftwareComponent.implementationLanguage Yes
is_null_call DeployedSoftwareComponent.additionalProperties.ucIsNullCall Yes
sql_data_access DeployedSoftwareComponent.additionalProperties.ucSQLDataAccess Yes
is_deterministic DeployedSoftwareComponent.additionalProperties.ucIsDeterministic Yes
routine_body DeployedSoftwareComponent.additionalProperties.ucRoutineBodyType Yes
Additional Values Egeria Type and Attribute Supported in OSS Version
Unity Catalog Function: Server URL : full_name DeployedAPI.qualifiedName Yes
Unity Catalog Function DeployedAPI.deployedImplementationType + DeployedSoftwareComponent.deployedImplementationType Yes
Databricks Unity Catalog Function: Server URL : full_name DeployedAPI.qualifiedName No
Databricks Unity Catalog Function DeployedAPI.deployedImplementationType + DeployedSoftwareComponent.deployedImplementationType No
Server URL ExternalIdentifier.mappingProperties.serverNetworkAddress Yes

The templates that implement this mapping are described in Unity Catalog Templates.

Anchor design for Unity Catalog

In order to have correct delete semantics, each of the unity catalog resources is its own anchored structure. In addition, each resource is anchored to its parent. So each table, function and volume is anchored to its schema and each schema is anchored to its catalog. The catalogs are anchored to their appropriate server.

The result is, if for example, a catalog is deleted, all the schemas, tables, functions and volumes nested underneath it are deleted too - ensuring there are no orphaned fragments of metadata left in the repository.

Metadata Collections

Each catalog in a Unity Catalog server is assigned its own metadata collection. The schemas, tables, functions and volumes within the catalog are all part of the catalog's metadata collection making it easy to identify the origin of these metadata elements.

Metadata Collections for Unity Catalog Resources

The unity connectors also use the metadata collections to scope the metadata they are processing.

Unity Catalog Connectors

The connectors shipped with Egeria are as follows:

Unity Catalog Connectors

Connector Name Connector Type Purpose
Unity Catalog Resource Connector Digital Resource Connector Provides wrapper around Unity Catalog's REST API.
Unity Catalog Server Survey Survey Action Service Surveys the contents of a Unity Catalog Server.
Unity Catalog Catalog Survey Survey Action Service Surveys the contents of a Unity Catalog Catalog.
Unity Catalog Schema Survey Survey Action Service Surveys the contents of a Unity Catalog Schema.
Unity Catalog Server Synchronizer Integration Connector Bootstraps the cataloguing of a Unity Catalog Server by retrieving the catalogs and configuring the Inside Catalog Connector (below).
Unity Catalog Inside Catalog Synchronizer Integration Connector Synchronizes the metadata describing a Unity Catalog Server's catalogs, schemas, tables, functions and volumes between Unity Catalog and the Open Metadata Ecosystem.

Raise an issue or comment below