Technical preview
Technical preview function is in a state that it can be tried. The development is complete, there is documentation and there are samples, tutorials and hands-on labs as appropriate.
The community is looking for feedback on the function before declaring it stable. This feedback may result in changes to the external interfaces.
Asset Lineage Open Metadata Access Service (OMAS)¶
Overview¶
Asset Lineage is metadata access service (OMAS) that consolidates and exports lineage metadata for assets in a cohort. This is achieved by actively watching the cohort and discovering specific asset types and relationships to other assets or metadata elements.
Following asset types are considered:
- DataStore (subtypes)
- Process
- DataSet
Asset Lineage OMAS then builds complex graph structures (sometimes we call this asset context) that are sent to an out topic address for further preservation and use with Open Lineage Server .
The above works well for scenario where metadata is actively shared on the cohort while it gets created. In different scenario, additional repository already prepopulated with existing metadata can join the cohort. Asset Lineage OMAS offers endpoint to handle this as well by allowing external system to request (or actively poll) and extract the metadata relevant for building lineage graph.
In all cases, Asset Lineage OMAS always relies on underlying Enterprise Repository Services OMRS subsystem to find and consolidate metadata by combining different elements available across the cohort.
User Guide¶
Most of the interaction with the Asset Lineage OMAS will be driven by the external tools used to build lineage using Data Engine OMAS and Data Engine Proxy or integrated lineage via Integration Services OMIS like Database integrator or Files integrator.
When enabled in a OMAG Metadata Access Server, it subscribes to the enterprise cohort topic and uses following events triggers:
Relationship events that build lineage graph:
- Lineage Mappings between Assets
- Lineage Mappings between Schema Elements
- Semantic Assignments between Glossary Terms and Schema Elements
Entity events to feed the changes assets that are crated or updated:
- Data Stores
- Processes
Interface choices¶
- Java client to integrate within Java programs,
- REST API to interact with external/remote systems,
- Out Topic Events to publish lineage related events.
Out Topic Events¶
LineageEntityEvent
{
"class": "LineageEntityEvent",
"eventVersionId": 1,
"assetLineageEventType": "UPDATE_ENTITY_EVENT",
"lineageEntity": {
"guid": "a8f71cfc-bd59-440e-afb9-9719d60a8fe3",
"typeDefName": "Process",
"createdBy": "cocoETLnpa",
"updatedBy": "cocoETLnpa",
"createTime": 1636326254855,
"updateTime": 1636326255079,
"version": 2,
"metadataCollectionId": "9beaa80a-50d9-44ba-b2ae-18f2379c9aa4",
"properties": {
"displayName": "ConvertFileToCSV",
"qualifiedName": "ConvertFileToCSV@CocoPharma/DataEngine/CocoETL",
"description": "Process named 'ConvertFileToCSV' representing high level processing activity performed by CocoETL tool."
}
}
}
LineageRelationshipEvent
{
"class": "LineageRelationshipEvent",
"eventVersionId": 1,
"assetLineageEventType": "NEW_RELATIONSHIP_EVENT",
"lineageRelationship": {
"guid": "aa2b43c5-eb9a-49bf-93fd-aeddf55812f3",
"typeDefName": "LineageMapping",
"createdBy": "cocoETLnpa",
"updatedBy": null,
"createTime": 1636326312659,
"updateTime": null,
"version": 1,
"metadataCollectionId": "9beaa80a-50d9-44ba-b2ae-18f2379c9aa4",
"properties": {},
"sourceEntity": {
"guid": "b117f8a5-deaf-47d1-80fe-15425b5b7f1f",
"typeDefName": "DataFile",
"createdBy": "cocoETLnpa",
"updatedBy": null,
"createTime": 1636326233314,
"updateTime": null,
"version": 1,
"metadataCollectionId": null,
"properties": {
"qualifiedName": "file://secured/research/previous-clinical-trials/old-archive.dat@CocoPharma/DataEngine/CocoETL"
}
},
"targetEntity": {
"guid": "a8f71cfc-bd59-440e-afb9-9719d60a8fe3",
"typeDefName": "Process",
"createdBy": "cocoETLnpa",
"updatedBy": "cocoETLnpa",
"createTime": 1636326254855,
"updateTime": 1636326255079,
"version": 2,
"metadataCollectionId": null,
"properties": {
"qualifiedName": "ConvertFileToCSV@CocoPharma/DataEngine/CocoETL"
}
}
}
}
REST API¶
Publish Entity¶
Find the entity by guid and publish the context for it
GET - Request lineage publish out for single entity (asset or term)
{{platformURLRoot}}/servers/{{serverName}}/open-metadata/access-services/asset-lineage/users/{{userId}}/publish-entity/{{entityTypeName}}/{{guid}}
Publish Entities¶
Scan the cohort based on the given entity type and publish the contexts for the found entities to the out topic.
GET - Request lineage publish out for entity type (asset or term)
{{platformURLRoot}}/servers/{{serverName}}/open-metadata/access-services/asset-lineage/users/{{userId}}/publish-entities/{{entityTypeName}}
Publish Context¶
Find the entity by guid and publish the asset context for it. It applies for data tables and files.
GET - Request asset context
{{platformURLRoot}}/servers/{{serverName}}/open-metadata/access-services/asset-lineage/users/{{userId}}/publish-context/{{entityTypeName}}/{{guid}}
Out Topic Connection¶
Return the connection object for the Asset Lineage's OMAS's out topic.
GET - Output topic OCF connection
{{platformURLRoot}}/servers/{{serverName}}/open-metadata/access-services/asset-lineage/users/{{userId}}/topics/out-topic-connection/{callerId}
Configuration¶
POST - Enable Asset Lineage OMAS with accessServiceOptions
{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/access-services/asset-lineage
{
"LineagePublisherBatchSize": 100,
"LineageClassificationTypes": [
"PrimaryCategory",
"Confidentiality",
"AssetZoneMembership",
"SubjectArea",
"AssetOwnership"
]
}
Detailed description of the properties
Property | Description |
---|---|
LineagePublisherBatchSize | Number of elements to be sent in a single event. This parameter is used to optimize event payload size and allow multiple elements to be grouped in a batch. Default is 1. |
LineageClassificationTypes | List of classification types considered while producing lineage events. The access service is always preconfigured with the default set listed in the example request body. Additional types can be added when necessary, they are always merged with the default set. |
Raise an issue or comment below