Technical preview function is in a state that it can be tried. The development is complete, there is documentation and there are samples, tutorials and hands-on labs as appropriate.
The community is looking for feedback on the function before declaring it stable. This feedback may result in changes to the external interfaces.
Open Discovery Framework (ODF)¶
The Open Discovery Framework (ODF) enables metadata discovery tools to integrate with open metadata repositories by defining the interfaces for metadata discovery components (called discovery services) to:
- Access metadata discovery configuration.
- Search for assets in the metadata repository.
- Extract all the metadata known about a specific asset.
- Record the results of the analysis in the open metadata repository and attach it to the asset's metadata for later processing.
A discovery service provides specific analysis of the metadata and contents of an asset on request.
It is implemented as a specialized connector.
A discovery service is initialized with a connector to the asset it is to analyze and details of the results of other discovery services that have run before it if it is part of a discovery pipeline.
The result is one or more sets of related properties that the discovery service has discovered about the asset, its metadata, structure and/or content. These are stored in a set of discovery annotations linked off of a discovery analysis report. The discovery analysis report is linked off of the asset definition in the open metadata repository.
A discovery context provides the discovery service with access to information about the discovery request along with the open metadata repository interfaces.
The discovery context provides parameters used by a discovery service to locate and analyze an asset and then record the results.
Discovery request type¶
Each discovery request type is associated with a discovery service. When a discovery request is made the discovery engine, it looks up the discovery request type and runs the associated discovery service.
Implementation in Egeria¶
Egeria's discovery configuration server support is implemented by the
Governance Engine OMAS.
It has a client called
implements the method equivalent to ODF's
The services used by the discovery services when they are running is implemented by the Discovery Engine OMAS.
It also supports event notifications through the Discovery Engine OMAS's out topic.
A discovery pipeline is a specialized implementation of a discovery service that runs a set of discovery services against a single asset. The implementation of the discovery pipeline determines the order that these discovery services are run.
The aim of the discovery pipeline is to enable a detailed picture of the properties of an asset to be built up by the discovery services it calls. Each discovery service is able to access the results of the discovery services that have run before it.
Some discovery annotations refer to an entire asset and others refer to a data field within an asset. The annotations that describe a single data field are called data field annotations.
|Classification annotation||Captures a recommendation of which classifications to attach to this asset. It can be made at the asset or data field level.|
|Data class annotation||Captures a recommendation of which data class this data field closely represents.|
|Data profile annotation||Capture the characteristics of the data values stored in a specific data field in a data source.|
|Data profile log annotation||Capture the names of the log files where profile characteristics of the data values stored in a specific data field. This is used when the profile results are too large to store in open metadata.|
|Data source measurement annotation||Collect arbitrary properties about a data source.|
|Data source physical status annotation||Documents the physical characteristics of a data source asset.|
|Relationship advice annotation||Document a recommended relationship that should be established with the asset.|
|Quality annotation||Document calculated quality scores on different dimensions.|
|Schema analysis annotation||Document the structure of the data (schema) inside the asset.|
|Semantic annotation||Documents suggested meanings for this data based on the values and name of the field.|
|Suspect duplicate annotation||Identifies other asset definitions that seem to point to the same physical asset.|
The open metadata types for a discovery annotations are described in area 6 of the model.
The main entity type is called
Annotation. It is extended by
DataFieldAnnotation to distinguish annotations that refer, primarily to a data field. Other more specialist annotations extend these two basic annotation types.
Discovery analysis report¶
The discovery analysis report is created in the open metadata repository by the discovery engine when it creates the discovery service instance. The discovery service can retrieve information about the discovery analysis report through the discovery analysis report store client.
Discovery analysis report store¶
The discovery analysis report store is a client to an open metadata server that enables a discovery service to query the properties of its discovery analysis report and update the analysis step that is currently executing.
The discovery analysis report store is accessed from the discovery annotation store.
The discovery analysis report store also enables a long-running discovery service (typically a discovery pipeline) to record its current analysis step.
Discovery annotation store¶
The discovery annotation store provides a discovery-service with a client to write discovery annotations to an open metadata repository. These annotations describe the results of the analysis performed on an asset by the discovery service.
The annotations are linked to a discovery analysis report that is in turn linked off of the analysed asset.
The discovery service is passed the discovery annotation store via the discovery context.
A discovery engine is the execution environment for discovery services.
The discovery engine configuration defines a set of discovery services. Its definition is stored in an open metadata repository and maintained through the Discovery Engine OMAS.
Discovery engines are hosted in discovery servers.
Egeria's implementation of the discovery engine is provided by the Asset Analysis OMES.
The discovery server is the server environment that hosts one or more discovery engines. Discovery servers are deployed close to the physical assets they are analysing. They connect to the Discovery Engine OMAS running in a metadata access server to provide metadata about assets and to store the results of the discovery service's analysis. Many discovery servers can use the same metadata server.
Discovery configuration server¶
The discovery configuration server is the server responsible for holding and managing the configuration needed by the discovery servers and the discovery engines within them.
Discovery asset catalog store¶
The discovery asset catalog store provides a search interface that enables a discovery service to locate assets that are described in the open metadata repository.
The discovery service is passed the discovery asset catalog store via the discovery context.
Egeria provides a full implementation of the ODF. It provides a discovery server as well as an implementation of the metadata server APIs by the Discovery Engine OMAS. There are also implementations of discovery services in the discovery-service-connectors module.
Raise an issue or comment below