Unity Catalog¶
Unity Catalog is a data manager catalog that governs access to data. It is typically managing data from data lakes and lakehouses where much of the data is in a parquet format, with a table abstraction over the top. Unity catalog also supports folders of files (called Volumes) and functions. Unity catalog is able to provide access to the data in its catalogs, and run the functions.
The picture below shows Unity Catalog managing access to data in Delta Lake.
Internally, unity catalog's metadata is organized into catalogs. (So one way to think of Unity Catalog is as a 'catalog of catalogs'.) Each catalog has multiple schemas and these contain the resources:
- Tables - these are virtual tables, typically backed by an Apache Parquet file.
- Functions - these are callable functions, typically implemented in SQL, but may be a callable external component.
- Volumes - these are collections of files.
As a result of this structure, the resources in Unity Catalog have a three level name: catalogName.schemaName.resourceName.
Unity Catalog Technology Type Names¶
The technology type names (aka deployed implementation types) added to Egeria's reference data for Unity Catalog are:
- Unity Catalog Server - The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.
- Unity Catalog Catalog - An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
- Unity Catalog Schema - A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
- Unity Catalog Table - A relational table within the Unity Catalog (UC) 'catalog of catalogs'.
- Unity Catalog Function - A function found in Unity Catalog (UC) that is working with data.
- Unity Catalog Volume - A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.
JSON output from tech type search for 'Unity Catalog'
{
"class": "TechnologyTypeSummaryListResponse",
"relatedHTTPCode": 200,
"elements": [
{
"technologyTypeGUID": "2d89345f-2650-4c04-bd5c-8cdbab7a0b79",
"qualifiedName": "Egeria:ValidMetadataValue:SoftwareServer:deployedImplementationType-(Unity Catalog Server)",
"name": "Unity Catalog Server",
"description": "The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.",
"category": "SoftwareServer:deployedImplementationType"
},
{
"technologyTypeGUID": "2b28dd27-3d4e-4c75-a3e8-cbbcbe8cb62f",
"qualifiedName": "Egeria:ValidMetadataValue:Catalog:deployedImplementationType-(Unity Catalog Catalog)",
"name": "Unity Catalog Catalog",
"description": "An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "Catalog:deployedImplementationType"
},
{
"technologyTypeGUID": "c56ca4d1-ed5a-4b05-b75b-e4b6bd3500ff",
"qualifiedName": "Egeria:ValidMetadataValue:DeployedDatabaseSchema:deployedImplementationType-(Unity Catalog Schema)",
"name": "Unity Catalog Schema",
"description": "A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "DeployedDatabaseSchema:deployedImplementationType"
},
{
"technologyTypeGUID": "3a1ad610-f5c5-4aba-a766-63965ac528be",
"qualifiedName": "Egeria:ValidMetadataValue:VirtualRelationalTable:deployedImplementationType-(Unity Catalog Table)",
"name": "Unity Catalog Table",
"description": "A relational table within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "VirtualRelationalTable:deployedImplementationType"
},
{
"technologyTypeGUID": "7f15dd5f-7569-4697-a3f1-491e399f4351",
"qualifiedName": "Egeria:ValidMetadataValue:DeployedAPI:deployedImplementationType-(Unity Catalog Function)",
"name": "Unity Catalog Function",
"description": "A function found in Unity Catalog (UC) that is working with data.",
"category": "DeployedAPI:deployedImplementationType"
},
{
"technologyTypeGUID": "dbabe8cb-345e-4665-a665-1bef56a26ecd",
"qualifiedName": "Egeria:ValidMetadataValue:DataFolder:deployedImplementationType-(Unity Catalog Volume)",
"name": "Unity Catalog Volume",
"description": "A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "DataFolder:deployedImplementationType"
}
]
}
Open Metadata Type Mapping for Unity Catalog¶
The mapping from Unity Catalog metadata elements to the Open Metadata Types used in the Open Metadata Ecosystem is as follows:
Technology Type | Open Metadata Type |
---|---|
Unity Catalog Server | SoftwareServer |
Unity Catalog Catalog | Catalog |
Unity Catalog Schema | DeployedDatabaseSchema |
Unity Catalog Function | DeployedAPI with an associated DeployedSoftwareComponent for its implementation. |
Unity Catalog Table | VirtualRelationalTable with an associated DataFolder for its files. |
Unity Catalog Volume | DataFolder |
In addition, each of these elements have a PropertyFacet and an External Identifier attached. The property facet contains implementation specific details; the external identifier includes the guid from unity catalog plus other mapping values such as the catalog name, schema name and short name to enable the Unity Catalog connectors to ensure that the name of a element has not changed since the last time a Unity Catalog element was retrieved.
The diagram below illustrates the mapping of the Unity Catalog metadata resource to the Open Metadata Types.
Below is the field mapping from Unity Catalog to these Open Metadata Types:
Unity Catalog Property | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
id | ExternalIdentifier.identifier | Yes |
name | Catalog.name | Yes |
comment | Catalog.description | Yes |
owner | Ownership.owner | No |
name | ExternalIdentifier.mappingProperties.ucCatalogName | Yes |
created_at | ExternalIdentifier.externalInstanceCreationTime | Yes |
created_by | ExternalIdentifier.externalInstanceCreatedBy | No |
updated_at | ExternalIdentifier.externalInstanceLastUpdateTime | Yes |
updated_by | ExternalIdentifier.externalInstanceLastUpdatedBy | No |
catalog_type | PropertyFacet.properties.ucCatalogType | No |
metastore_id | PropertyFacet.properties.ucMetastoreId | No |
isolation_mode | PropertyFacet.properties.ucIsolationMode | No |
accessible_in_current_workspace | PropertyFacet.properties.ucAccessibleInCurrentWorkspace | No |
browse_only | PropertyFacet.properties.ucBrowseOnly | No |
securable_type | PropertyFacet.properties.ucSecurableType | No |
securable_kind | PropertyFacet.properties.ucSecurableKind | No |
Additional Values | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
Unity Catalog Catalog: Server URL : full_name | Catalog.qualifiedName | Yes |
Unity Catalog Catalog | Catalog.deployedImplementationType | Yes |
Databricks Unity Catalog Catalog: Server URL | Catalog.qualifiedName | No |
Databricks Unity Catalog Catalog | Catalog.deployedImplementationType | No |
Server URL | ExternalIdentifier.mappingProperties.serverNetworkAddress | Yes |
Unity Catalog Property | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
schema_id | ExternalIdentifier.identifier | Yes |
name | DeployedDatabaseSchema.name | Yes |
full_name | DeployedDatabaseSchema.resourceName | Yes |
comment | DeployedDatabaseSchema.description | Yes |
owner | Ownership.owner | No |
created_at | ExternalIdentifier.externalInstanceCreationTime | Yes |
created_by | ExternalIdentifier.externalInstanceCreatedBy | No |
updated_at | ExternalIdentifier.externalInstanceLastUpdateTime | Yes |
updated_by | ExternalIdentifier.externalInstanceLastUpdatedBy | No |
catalog_name | ExternalIdentifier.mappingProperties.ucCatalogName | Yes |
name | ExternalIdentifier.mappingProperties.ucSchemaName | Yes |
catalog_type | PropertyFacet.properties.ucCatalogType | No |
metastore_id | PropertyFacet.properties.ucMetastoreId | No |
browse_only | PropertyFacet.properties.ucBrowseOnly | No |
securable_type | PropertyFacet.properties.ucSecurableType | No |
securable_kind | PropertyFacet.properties.ucSecurableKind | No |
properties | PropertyFacet.properties | No |
Additional Values | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
Unity Catalog Schema: Server URL : full_name | DeployedDatabaseSchema.qualifiedName | Yes |
Unity Catalog Schema | DeployedDatabaseSchema.deployedImplementationType | Yes |
Databricks Unity Catalog Schema: Server URL : full_name | DeployedDatabaseSchema.qualifiedName | No |
Databricks Unity Catalog Schema | DeployedDatabaseSchema.deployedImplementationType | No |
Server URL | ExternalIdentifier.mappingProperties.serverNetworkAddress | Yes |
Unity Catalog Property | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
volume_id | ExternalIdentifier.identifier | Yes |
name | FileFolder.name + ExternalIdentifier.mappingProperties.ucVolumeName | Yes |
full_name | FileFolder.resourceName | Yes |
comment | FileFolder.description | Yes |
storage_location | FileFolder.pathName + Endpoint.networkAddress + PropertyFacet.properties.ucStorageLocation | Yes |
owner | Ownership.owner | No |
created_at | ExternalIdentifier.externalInstanceCreationTime | Yes |
created_by | ExternalIdentifier.externalInstanceCreatedBy | No |
updated_at | ExternalIdentifier.externalInstanceLastUpdateTime | Yes |
updated_by | ExternalIdentifier.externalInstanceLastUpdatedBy | No |
catalog_name | ExternalIdentifier.mappingProperties.ucCatalogName | Yes |
schema_name | ExternalIdentifier.mappingProperties.ucSchemaName | Yes |
volume_type | PropertyFacet.properties.ucVolumeType | Yes |
metastore_id | PropertyFacet.properties.ucMetastoreId | No |
browse_only | PropertyFacet.properties.ucBrowseOnly | No |
securable_type | PropertyFacet.properties.ucSecurableType | No |
securable_kind | PropertyFacet.properties.ucSecurableKind | No |
resource_name | PropertyFacet.properties.ucResourceName | No |
Additional Values | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
Unity Catalog Volume: Server URL : full_name | FileFolder.qualifiedName | Yes |
Unity Catalog Volume | FileFolder.deployedImplementationType | Yes |
Databricks Unity Catalog Volume: Server URL : full_name | FileFolder.qualifiedName | No |
Databricks Unity Catalog Volume | FileFolder.deployedImplementationType | No |
Server URL | ExternalIdentifier.mappingProperties.serverNetworkAddress | Yes |
Unity Catalog Property | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
table_id | ExternalIdentifier.identifier | Yes |
name | VirtualRelationalTable.name + ExternalIdentifier.mappingProperties.ucTableName | Yes |
full_name | VirtualRelationalTable.resourceName | Yes |
comment | VirtualRelationalTable.description | Yes |
owner | Ownership.owner | No |
data_source_format | DataAssetEncoding.encoding | Yes |
columns.name | RelationalColumn.displayName | Yes |
columns.comment | RelationalColumn.description | Yes |
columns.position | RelationalColumn.position | Yes |
columns.nullable | RelationalColumn.isNullable | Yes |
columns.type_precision | RelationalColumn.precision | Yes |
columns.type_scale | RelationalColumn.significantDigits | Yes |
columns.partition_index | RelationalColumn.additionalProperties.ucPartitionIndex | Yes |
columns.type_text | TypeEmbeddedAttribute.displayName | Yes |
columns.type_name | TypeEmbeddedAttribute.dataType | Yes |
columns.type_interval_type | TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType | Yes |
columns.type_json | TypeEmbeddedAttribute.additionalProperties.ucTypeJSON | Yes |
created_at | ExternalIdentifier.externalInstanceCreationTime | Yes |
created_by | ExternalIdentifier.externalInstanceCreatedBy | No |
updated_at | ExternalIdentifier.externalInstanceLastUpdateTime | Yes |
updated_by | ExternalIdentifier.externalInstanceLastUpdatedBy | No |
catalog_name | ExternalIdentifier.mappingProperties.ucCatalogName | Yes |
schema_name | ExternalIdentifier.mappingProperties.ucSchemaName | Yes |
storage_location | PropertyFacet.properties.ucStorageLocation | Yes |
table_type | PropertyFacet.properties.ucTableType | Yes |
metastore_id | PropertyFacet.properties.ucMetastoreId | No |
browse_only | PropertyFacet.properties.ucBrowseOnly | No |
securable_type | PropertyFacet.properties.ucSecurableType | No |
securable_kind | PropertyFacet.properties.ucSecurableKind | No |
resource_name | PropertyFacet.properties.ucResourceName | No |
Additional Values | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
Unity Catalog Table: Server URL : full_name | VirtualRelationalTable.qualifiedName | Yes |
Unity Catalog Table | VirtualRelationalTable.deployedImplementationType | Yes |
Databricks Unity Catalog Table: Server URL : full_name | VirtualRelationalTable.qualifiedName | No |
Databricks Unity Catalog Table | VirtualRelationalTable.deployedImplementationType | No |
Server URL | ExternalIdentifier.mappingProperties.serverNetworkAddress | Yes |
Unity Catalog Property | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
function_id | ExternalIdentifier.identifier | Yes |
name | DeployedAPI.name + DeployedSoftwareComponent.name + ExternalIdentifier.mappingProperties.ucFunctionName | Yes |
full_name | DeployedAPI.resourceName | Yes |
comment | DeployedAPI.description | Yes |
owner | Ownership.owner | No |
input_parameters.parameters.name | APIParameter.displayName | Yes |
input_parameters.parameters.comment | APIParameter.description | Yes |
input_parameters.parameters.position | APIParameter.position | Yes |
input_parameters.parameters.nullable | APIParameter.isNullable | Yes |
input_parameters.parameters.type_precision | APIParameter.precision | Yes |
input_parameters.parameters.type_scale | APIParameter.significantDigits | Yes |
input_parameters.parameters.type_text | TypeEmbeddedAttribute.displayName | Yes |
input_parameters.parameters.type_name | TypeEmbeddedAttribute.dataType | Yes |
input_parameters.parameters.parameter_default | TypeEmbeddedAttribute.defaultValue | Yes |
input_parameters.parameters.type_interval_type | TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType | Yes |
input_parameters.parameters.type_json | TypeEmbeddedAttribute.additionalProperties.ucTypeJSON | Yes |
input_parameters.parameters.parameter_mode | APIParameter.additionalProperties.ucParameterMode | Yes |
input_parameters.parameters.parameter_type | APIParameter.parameterType | Yes |
return_parameters | DataAssetEncoding.encoding | Yes |
return_parameters.parameters.name | APIParameter.displayName | Yes |
return_parameters.parameters.comment | APIParameter.description | Yes |
return_parameters.parameters.position | APIParameter.position | Yes |
return_parameters.parameters.nullable | APIParameter.isNullable | Yes |
return_parameters.parameters.type_precision | APIParameter.precision | Yes |
return_parameters.parameters.type_scale | APIParameter.significantDigits | Yes |
return_parameters.parameters.type_text | TypeEmbeddedAttribute.displayName | Yes |
return_parameters.parameters.type_name | TypeEmbeddedAttribute.dataType | Yes |
return_parameters.parameters.parameter_default | TypeEmbeddedAttribute.defaultValue | Yes |
return_parameters.parameters.type_interval_type | TypeEmbeddedAttribute.additionalProperties.ucTypeIntervalType | Yes |
return_parameters.parameters.type_json | TypeEmbeddedAttribute.additionalProperties.ucTypeJSON | Yes |
return_parameters.parameters.parameter_mode | APIParameter.additionalProperties.ucParameterMode | Yes |
return_parameters.parameters.parameter_type | APIParameter.parameterType | Yes |
created_at | ExternalIdentifier.externalInstanceCreationTime | Yes |
created_by | ExternalIdentifier.externalInstanceCreatedBy | No |
updated_at | ExternalIdentifier.externalInstanceLastUpdateTime | Yes |
updated_by | ExternalIdentifier.externalInstanceLastUpdatedBy | No |
catalog_name | ExternalIdentifier.mappingProperties.ucCatalogName | Yes |
schema_name | ExternalIdentifier.mappingProperties.ucSchemaName | Yes |
storage_location | PropertyFacet.properties.ucStorageLocation | Yes |
data_type | PropertyFacet.properties.ucFunctionDataType | Yes |
full_data_type | PropertyFacet.properties.ucFunctionFullDataType | Yes |
routine_parameter_style | PropertyFacet.properties.ucRoutineParameterStyle | Yes |
security_type | PropertyFacet.properties.ucSecurityType | Yes |
specific_name | PropertyFacet.properties.ucSpecificName | Yes |
metastore_id | PropertyFacet.properties.ucMetastoreId | No |
browse_only | PropertyFacet.properties.ucBrowseOnly | No |
securable_type | PropertyFacet.properties.ucSecurableType | No |
securable_kind | PropertyFacet.properties.ucSecurableKind | No |
resource_name | PropertyFacet.properties.ucResourceName | No |
routine_dependencies | ProcessCall relationship | Yes |
routine_definition | DeployedSoftwareComponent.description | Yes |
external_language | DeployedSoftwareComponent.implementationLanguage | Yes |
is_null_call | DeployedSoftwareComponent.additionalProperties.ucIsNullCall | Yes |
sql_data_access | DeployedSoftwareComponent.additionalProperties.ucSQLDataAccess | Yes |
is_deterministic | DeployedSoftwareComponent.additionalProperties.ucIsDeterministic | Yes |
routine_body | DeployedSoftwareComponent.additionalProperties.ucRoutineBodyType | Yes |
Additional Values | Egeria Type and Attribute | Supported in OSS Version |
---|---|---|
Unity Catalog Function: Server URL : full_name | DeployedAPI.qualifiedName | Yes |
Unity Catalog Function | DeployedAPI.deployedImplementationType + DeployedSoftwareComponent.deployedImplementationType | Yes |
Databricks Unity Catalog Function: Server URL : full_name | DeployedAPI.qualifiedName | No |
Databricks Unity Catalog Function | DeployedAPI.deployedImplementationType + DeployedSoftwareComponent.deployedImplementationType | No |
Server URL | ExternalIdentifier.mappingProperties.serverNetworkAddress | Yes |
The templates that implement this mapping are described in Unity Catalog Templates.
Anchor design for Unity Catalog¶
In order to have correct delete semantics, each of the unity catalog resources is its own anchored structure. In addition, each resource is anchored to its parent. So each table, function and volume is anchored to its schema and each schema is anchored to its catalog. The catalogs are anchored to their appropriate server.
The result is, if for example, a catalog is deleted, all the schemas, tables, functions and volumes nested underneath it are deleted too - ensuring there are no orphaned fragments of metadata left in the repository.
Metadata Collections¶
Each catalog in a Unity Catalog server is assigned its own metadata collection. The schemas, tables, functions and volumes within the catalog are all part of the catalog's metadata collection making it easy to identify the origin of these metadata elements.
The unity connectors also use the metadata collections to scope the metadata they are processing.
Unity Catalog Connectors¶
The connectors shipped with Egeria are as follows:
Connector Name | Connector Type | Purpose |
---|---|---|
Unity Catalog Resource Connector | Digital Resource Connector | Provides wrapper around Unity Catalog's REST API. |
Unity Catalog Server Survey | Survey Action Service | Surveys the contents of a Unity Catalog Server. |
Unity Catalog Catalog Survey | Survey Action Service | Surveys the contents of a Unity Catalog Catalog. |
Unity Catalog Schema Survey | Survey Action Service | Surveys the contents of a Unity Catalog Schema. |
Unity Catalog Server Synchronizer | Integration Connector | Bootstraps the cataloguing of a Unity Catalog Server by retrieving the catalogs and configuring the Inside Catalog Connector (below). |
Unity Catalog Inside Catalog Synchronizer | Integration Connector | Synchronizes the metadata describing a Unity Catalog Server's catalogs, schemas, tables, functions and volumes between Unity Catalog and the Open Metadata Ecosystem. |
Raise an issue or comment below