Skip to content

Configuring a Lineage Warehouse

The Lineage Warehouse captures lineage from the open metadata ecosystem and maintains an optimized historical record of lineage for reporting. It is designed for use in regulated industries where lineage must be captured and retained.

The configuration for a lineage warehouse requires knowledge of the metadata access server that supplied open metadata via the Asset Lineage OMAS and the location of its store.

Configuration for a lineage warehouse

The configuration document is built up using a series of administration calls:

Configuring the Basic Server Properties

Configure the basic server properties

The basic server properties are used in logging and events originating from the server. They help to document the purpose of the server (which helps with problem determination) and enable performance improvements by allowing the server to ignore activity or metadata that is not relevant to its operation.

Property Description
localServerDescription Description for the server. This is useful information for the administrator to understand the role of the server. The default value is null.
organizationName Descriptive name for the organization that owns the local server/repository. This is useful when the open metadata repository cluster consists of metadata servers from different organizations, or different departments of an enterprise. The default value is null.
localServerUserId UserId to use for server-initiated REST calls. The default is OMAGServer.
localServerPassword Password to use for server-initiated REST calls. The default is null. This means that only the userId is sent in the HTTP header.
localServerURL The URL of the platform where the server is to be deployed. It should be the value used by external services to call the server since its broadcast across an open metadata repository cohort and used when deploying the server's configuration document to the correct platform.
maxPageSize The maximum page size that can be set on requests to the server. The default value is 1000. A value of zero means unlimited page size. Although supported, the zero value is not recommended because it provides no protection from a large request denial of service attack.

Typically, these values are set up in a single command.

setBasicServerProperties

Set up the basic server properties in a single request. If any values are left blank, they are cleared in the server configuration document.

String adminUserId = "garygeeke";
String serverName = "active-metadata-server"
String adminPlatformURLRoot = "https://127.0.0.1:9443";

OMAGServerConfigurationClient configurationClient = new OMAGServerConfigurationClient(adminUserId, 
                                                                                      serverName, 
                                                                                      adminPlatformURLRoot);


String organizationName = "Coco Pharmaceuticals";
String serverDescription = "This server supports the governance teams";
String serverUserId = "cocomds2npa";
String serverPassword = "secret";
String serverURLRoot = "https://localhost:9443"
int    maxPageSize = 1000

configurationClient.setBasicServerProperties(organizationName,
                                             serverDescription,
                                             serverUserId,
                                             serverPassword,
                                             serverURLRoot,
                                             maxPageSize);
admin_user_id="garygeeke"
server_name="active-metadata-store"
admin_platform_url_root="https://127.0.0.1:9443"

config_client=CoreServerConfig(server_name,
                               admin_platform_url_root,
                               admin_user_id)

local_server_description="This server supports the governance teams"
organization_name="Coco Pharmaceuticals"
local_server_url="https://127.0.0.1:9443"
local_server_user_id="cocomds2npa"
local_server_password="secret"
max_page_size = 1000

config_client.set_basic_server_properties(local_server_description,
                                          organization_name,
                                          local_server_url,
                                          local_server_user_id,
                                          local_server_password,
                                          max_page_size)

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-properties
with a request body of:
{
  "localServerDescription" : "This server supports the governance teams",
  "organizationName" : "Coco Pharmaceuticals",
  "localServerURL" : "https://localhost:9443",
  "localServerUserId" : "cocomds2npa",
  "localServerPassword" : "secret",
  "maxPageSize" : 600
}

Alternatively, you can set these properties one at a time.

setServerDescription

The server description should be set to something that describes the OMAG Server's role. It may be the name of a specific product that it is enabling, or a role in the metadata and governance landscape. Its purpose is to help administrators identify which server configuration they need to work with.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-description
The description is passed in the request body as a text string.

setOrganizationName

The organization name may be the owning organization or you may use it to identify the department or team that is supported by this server.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/organization-name?name="{{organizationName}}"

setServerUserId

The server's userId is used when processing requests that do not have an end user, such as receiving an event from a topic. The default value is OMAGServer. Ideally each server should have its own userId, so it is possible to restrict the resources that each server has access to and identify the origin of updates to the metadata elements.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-user-id?id="{{serverUserId}}"

setServerPassword

If the password is specified, the userId and password combination are used to provide authentication information on each REST call made by the server.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-user-password?password="{{serverUserPassword}}"

setServerURLRoot

Configure the targetPlatformURLRoot with the platform URL Root value of where the OMAG Server Platform will run. This may not be the same as platformURLRoot if the configuration document will be deployed to a different OMAG Server Platform from the one used to maintain it.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-url-root?url={{targetPlatformURLRoot}}

What is the difference between {{platformURLRoot}} and {{targetPlatformURLRoot}}?

The {{targetPlatformURLRoot}} gives the location of the OMAG Server Platform on which this configured service is intended to run, while the {{platformURLRoot}} gives the location of the OMAG Server Platform in which this configuration document is maintained.

They could be, but do not need to be, the same location.

setMaxPageSize

The maximum page size value sets an upper limit on the number of results that a caller can request on any paging REST API to this server. Setting maximum page size helps to prevent a denial of service attack that uses very large requests to overwhelm the server. A value of 0 means no limit, and leaves the server open to such attacks.

POST {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/max-page-size?limit={{maxPageSize}}

Retrieving a server's basic properties

It is possible to retrieve the basic server properties to verify the values they are set to.

getBasicServerProperties

Return the basic server properties in a single request.

String adminUserId = "garygeeke";
String serverName = "active-metadata-server"
String adminPlatformURLRoot = "https://127.0.0.1:9443";

OMAGServerConfigurationClient configurationClient = new OMAGServerConfigurationClient(adminUserId, 
                                                                                      serverName, 
                                                                                      adminPlatformURLRoot);

BasicServerProperties basicServerProperties = configurationClient.getBasicServerProperties();
admin_user_id="garygeeke"
server_name="active-metadata-store"
admin_platform_url_root="https://127.0.0.1:9443"

config_client=CoreServerConfig(server_name,
                               admin_platform_url_root,
                               admin_user_id)

config_client.get_basic_server_properties()
GET {{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/server-properties
Configuring the default Event Bus

Set up the default event bus

An OMAG Server uses an event bus such as Apache Kafka to exchange events with other servers and tools.

Egeria manages the specific topic names and the event payloads; however, it needs to know where the event bus is deployed and any properties needed to configure it.

Since the event bus is used in multiple places, the configuration document allows you to set up the details of the event bus which are then incorporated into all the places where the event bus is needed.

Important sequencing information

You need to set up this information before configuring any of the following:

The following command creates information about the event bus. This information is used on the subsequent configuration of the OMAG Server subsystems. It does not affect any subsystems that have already been configured in the configuration document and if the event bus is not needed, its values are ignored.

It is possible to add arbitrary name/value pairs as JSON in the request body. The correct properties to use are defined in the connector type for the event bus.

Fine-grained helper command

POST - configure event bus

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/event-bus
Example: Apache Kafka

For example, when using Apache Kafka as your event bus you may want to configure properties that control the behavior of the consumer that receives events and the producer that sends events. This is a typical set of producer and consumer properties:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
    "producer": {
        "bootstrap.servers":"localhost:9092",
        "acks":"all",
        "retries":"0",
        "batch.size":"16384",
        "linger.ms":"1",
        "buffer.memory":"33554432",
        "max.request.size":"10485760",
        "key.serializer":"org.apache.kafka.common.serialization.StringSerializer",
        "value.serializer":"org.apache.kafka.common.serialization.StringSerializer",
        "kafka.omrs.topic.id":"cocoCohort"
    },
    "consumer": {
        "bootstrap.servers":"localhost:9092",
        "zookeeper.session.timeout.ms":"400",
        "zookeeper.sync.time.ms":"200",
        "fetch.message.max.bytes":"10485760",
        "max.partition.fetch.bytes":"10485760",
        "key.deserializer":"org.apache.kafka.common.serialization.StringDeserializer",
        "value.deserializer":"org.apache.kafka.common.serialization.StringDeserializer",
        "kafka.omrs.topic.id":"cocoCohort"
    }
}

A different type of event bus would use different properties.

Configuring the Audit Log Destinations

Configure the audit log

Egeria's audit log provides a configurable set of destinations for audit records and other diagnostic logging for an OMAG Server. Some destinations also support a query interface to allow an administrator to understand how the server is running.

Each audit log record has a severity that can be used to route it to one or more specific destinations. Therefore, when an audit log destination is configured, it is optionally supplied with a list of severities to filter the types of audit log records it should receive.

The audit log severities are as follows:

Severity Description
Information The server is providing information about its normal operation.
Event An event was received from another member of the open metadata repository cohort.
Decision A decision has been made related to the interaction of the local metadata repository and the rest of the cohort.
Action An Action is required by the administrator. At a minimum, the situation needs to be investigated and if necessary, corrective action taken.
Error An error occurred, possibly caused by an incompatibility between the local metadata repository and one of the remote repositories. The local repository may restrict some of the metadata interchange functions as a result.
Exception An unexpected exception occurred. This means that the server needs some administration attention to correct configuration or fix a logic error because it is not operating as a proper peer in the open metadata repository cohort.
Security Unauthorized access to a service or metadata instance has been attempted.
Startup A new component is starting up.
Shutdown An existing component is shutting down.
Asset An auditable action relating to an asset has been taken.
Types Activity is occurring that relates to the open metadata types in use by this server.
Cohort The server is exchanging registration information about an open metadata repository cohort that it is connecting to.
Trace This is additional information on the operation of the server that may be of assistance in debugging a problem. It is not normally logged to any destination, but can be added when needed.
PerfMon This log record contains performance monitoring timing information for specific types of processing. It is not normally logged to any destination, but can be added when needed.
<Unknown> Uninitialized Severity

The default audit log destination is the console audit log destination. This writes selected parts of each audit log record to "standard out" (stdout).

It is configured to receive log records of all severities except Activity, Event, Trace and PerfMon. It is added automatically to a server's configuration document when other sections are configured.

Add audit log destinations

If the server is a development or test server, then the default audit log configuration is probably sufficient, and you should use the following command:

POST - set default audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/default

Note: Using this command overrides all previous audit log destinations configured for the server.

If this server is a production server then you will probably want to set up the audit log destinations explicitly. You can add multiple destinations and each one can be set up to receive different severities of audit log records.

There are various destinations that can be configured for the audit log:

Since the default audit log destination is also a console audit log destination, only use this option to add the Trace and PerfMon severities.

POST - add console audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/console

The body of the request should be a list of severities

If an empty list is passed as the request body then all severities are supported by the destination.

This destination writes JSON files in a shared directory. One file for each audit log record.

POST - add JSON file-based audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/files

The body of the request should be a list of severities

If an empty list is passed as the request body then all severities are supported by the destination.

This destination writes each log record as an event on the supplied event topic. It assumes that the event bus is set up first.

POST - add event-based audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/event-topic

The body of the request should be a list of severities

If an empty list is passed as the request body then all severities are supported by the destination.

This writes full log records to the slf4j ecosystem. When configuring slf4j as destination you also need to specify audit log logger category via the application properties of the OMAG Server Platform. This is described in Connecting the OMAG Audit Log Framework section of the developer logging guide.

The configuration of the slf4j ecosystem determines it ultimate destination(s).

POST - add slf4j audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/slf4j

The body of the request should be a list of severities

If an empty list is passed as the request body then all severities are supported by the destination.

This sets up an audit log destination that is described though a connection. In this case, the connection is passed in the request body and the supported severities are supplied in the connection's configuration properties.

POST - add connection-based audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/connection

It is also possible to set up all the audit log destinations in one command as a list of connections. Using this option overrides all previous audit log destinations and so can be used as the update command. The list of connections is passed in the request body and the supported severities are supplied in each connection's configuration properties.

POST - add a list of connection-based audit log destinations

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations

Retrieving audit log destinations

The configured list of audit log destinations can be retrieved using this command:

GET - the list of configured audit log destinations

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations

Updating audit log destinations

Audit log destinations can be updated individually, by qualified name using the following command:

POST - update connection-based audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/connection/{{qualifiedName}}

If you are not sure what the audit log connection is called, retrieve the list of configured audit log connections and the resulting list of audit log connections will include the qualified names.

Remove audit log destinations

The following will remove all audit log destinations, enabling you to add a new set of audit log destinations.

DELETE - clear all audit log destinations

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations

It is also possible to remove a single audit log destination using its connection's qualified name.

DELETE - clear then named audit log destination

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/audit-log-destinations/{{qualifiedName}}
Configuring the Server Metadata Security Connector

Configure the server security connector

Metadata that is being aggregated from different sources is likely to need comprehensive access controls.

Egeria provides fine-grained security control for metadata access. It is implemented in a server metadata security connector that is called whenever requests are made for to the server.

Security is configured for a specific OMAG Server by adding a connection for this connector to the server's configuration documentusing the following command.

POST - configure security connector

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/security/connection

This passes in a connection used to create the server security connector in the request body.

{
    "class": "Connection",
    "connectorType": {
        "class": "ConnectorType",
        "connectorProviderClassName": "{fullyQualifiedJavaClassName}"
    }
}
Example: set up the sample server security connector

For example, this is the connection that would set up the sample server security connector provided for the Coco Pharmaceuticals case study:

{
    "class": "Connection",
    "connectorType": {
        "class": "ConnectorType",
        "connectorProviderClassName": "org.odpi.openmetadata.metadatasecurity.samples.OpenMetadataServerSecurityProvider"
    }
}

Determine configured security

GET - query the server security connector setting

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/security/connection
Response indicating no security
{
    "class": "ConnectionResponse",
    "relatedHTTPCode": 200
}
Response indicating a specific security connector

If the response looks more like the JSON below, a connector is configured. The connectorProviderClassName tells you which connector is being used.

{
    "class": "ConnectionResponse",
    "relatedHTTPCode": 200,
    "connection": {
        "class": "Connection",
        "connectorType": {
            "class": "ConnectorType",
            "connectorProviderClassName": "{fullyQualifiedJavaClassName}"
        }
    }
}

Remove configured security

DELETE - remove configured security connector

{{platformURLRoot}}/open-metadata/admin-services/users/{{adminUserId}}/servers/{{serverName}}/security/connection

This removes all authorization checking from the server.

Configuring the Lineage Warehouse Services

Configuring the Lineage Warehouse Services

!!! post "POST - Configure Lineage Warehouse Services"
    ```
    {{serverURLRoot}}/open-metadata/admin-services/users/{{userId}}/servers/{{serverName}}/lineage-warehouse/configuration
    ```

    ```json
    {
        "class": "LineageWarehouseConfig",
        "openLineageDescription": "Lineage Warehouse Service is used for the storage and querying of lineage",
        "lineageGraphConnection": {
            "class": "Connection",
            "displayName": "Lineage Graph Connection",
            "description": "Used for storing lineage in the Open Metadata format",
            "connectorType": {
                "class": "ConnectorType",
                "connectorProviderClassName": "org.odpi.openmetadata.openconnectors.governancedaemonconnectors.lineagewarehouseconnectors.janusconnector.graph.LineageGraphConnectorProvider"
            },
            "configurationProperties": {
                "gremlin.graph": "org.janusgraph.core.JanusGraphFactory",
                "storage.backend": "berkeleyje",
                "storage.directory": "data/servers/{{ols-server-name}}/repository/berkeley",
                "index.search.backend": "lucene",
                "index.search.directory": "data/servers/{{ols-server-name}}/repository/searchindex"
            }
        },
        "accessServiceConfig": {
            "serverName": "{{server-name}}",
            "serverPlatformUrlRoot": "{{server-platform-url}}",
            "user": "admin",
            "password": "secret"
        },
        "inTopicConnection": {
            "class": "VirtualConnection",
            "qualifiedName": "OutTopicConnector.Asset Lineage OMAS",
            "displayName": "OutTopicConnector.Asset Lineage OMAS",
            "description": "Client-side topic connection.",
            "connectorType": {
                "class": "ConnectorType",
                "qualifiedName": "Asset Lineage Out Topic Client Connector",
                "displayName": "Asset Lineage Out Topic Client Connector",
                "description": "Connector supports the receipt of events on the Asset Lineage OMAS Out Topic.",
                "connectorProviderClassName": "org.odpi.openmetadata.accessservices.assetlineage.outtopic.connector.AssetLineageOutTopicClientProvider"
            },
            "embeddedConnections": [
                {
                    "class": "EmbeddedConnection",
                    "displayName": "Topic Event Bus",
                    "embeddedConnection": {
                        "class": "Connection",
                        "connectorType": {
                            "class": "ConnectorType",
                            "qualifiedName": "Egeria:OpenMetadataTopicConnector:Kafka",
                            "displayName": "Apache Kafka Open Metadata Topic Connector",
                            "description": "Apache Kafka Open Metadata Topic Connector supports string based events over an Apache Kafka event bus.",
                            "supportedAssetTypeName": "KafkaTopic",
                            "expectedDataFormat": "PLAINTEXT",
                            "connectorProviderClassName": "org.odpi.openmetadata.adapters.eventbus.topic.kafka.KafkaOpenMetadataTopicProvider"
                            "recognizedConfigurationProperties": [
                                "producer",
                                "consumer",
                                "local.server.id",
                                "sleepTime"
                            ]
                        },
                        "endpoint": {
                            "class": "Endpoint",
                            "headerVersion": 0,
                            "address": "OMRSTopic.server.omas.omas.assetlineage.outTopic"
                        },
                        "configurationProperties": {
                            "producer": {
                                "bootstrap.servers": "server:port",
                                "key.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
                                "value.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
                                "group.id": "custom-producer-id",
                                "kafka.omrs.topic.id": "OMRSTopic"
                            },
                            "consumer": {
                                "bootstrap.servers": "server:port",
                                "key.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
                                "value.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
                                "group.id": "custom-consumer-id",
                                "kafka.omrs.topic.id": "OMRSTopic"
                            }
                        }
                    }
                }
            ]
        },
        "backgroundJobs": [
            {
                "jobName": "LineageGraphJob",
                "jobInterval": 120,
                "jobEnabled": "false"
            },
            {
                "jobName": "AssetLineageUpdateJob",
                "jobInterval": 120,
                "jobEnabled": "false",
                "jobDefaultValue": "2021-01-01T00:00:00"
            }
        ]
    }
    ```

#### Configuration reference

| Property | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Is mandatory |
|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
`lineageGraphConnection` | OCF configuration object that defines the Graph store connector type used. See [open-lineage-janus-connector](/connectors/governance-daemon/open-lineage-janus-connector) for more details.                                                                                                                                                                                                                                                                                                            | Yes |
`accessServiceConfig.serverName` | the name of the metadata server where paired Asset Lineage OMAS is running.                                                                                                                                                                                                                                                                                                                                                                                                                            | Yes |
`accessServiceConfig.serverPlatformUrlRoot` | The URL of the OMAG server platform running the metadata server where paired Asset Lineage OMAS is running. Also see [start-up information](#start-up-information) section.                                                                                                                                                                                                                                                                                                                            | Yes |
`accessServiceConfig.user` | The username to access the server running Asset Lineage OMAS.                                                                                                                                                                                                                                                                                                                                                                                                                                          | Yes |
`accessServiceConfig.password` | The user password to access the server running Asset Lineage OMAS. Can be left out for non-secured access.                                                                                                                                                                                                                                                                                                                                                                                             | No |
`inTopicConnection` | [Connection object](/concepts/connection) that provides the Asset Lineage OMAS topic connection definition . If provided, it will override the default configuration.                                                                                                                                                                                                                                                                                                                                  | No |
`backgroundJobs[n].jobName` | Key used to match the job name pre-defined in the Lineage Warehouse. Supported values `LineageGraphJob` and `AssetLineageUpdateJob`                                                                                                                                                                                                                                                                                                                                                                  | No |
`backgroundJobs[n].jobInterval` | Interval (**seconds**) to execute the repetitive task defined by the named job above                                                                                                                                                                                                                                                                                                                                                                                                                   | No |
`backgroundJobs[n].jobEnabled` | Controls if the job will be running (enabled) or not (disabled). Omitting the item in the `backgroundJobs` list had the same effect as setting the job to disable.                                                                                                                                                                                                                                                                                                                                     | No |
`backgroundJobs[n].jobDefaultValue` | Setting initial value for the task, only used in case of `AssetLineageUpdateJob`. When configured and not present in the store this value becomes the starting point in time to poll for updates. After successful update initial value is no longer used and last known value form the store. The value should be always specified in standard internet data-time format `YYYY-MM-DDThh:mm:ss`. See [ISO-8601](https://datatracker.ietf.org/doc/html/rfc3339#section-5.8) for more info and examples. | No |

#### Removing the Lineage Warehouse Services from the server configuration

!!! delete "DELETE - Remove Lineage Warehouse Services from the server"
    ```
    {{serverURLRoot}}/open-metadata/admin-services/users/{{userId]}}/servers/{{serverName}}/open-lineage/configuration
    ```

### Start up information

!!! info "Runtime consideration" It is important to consider that, to operate, Lineage Warehouse depends on the availability of its Metadata Access Server (with Asset Lineage OMAS) being up and running. This is the case because Lineage Warehouse discovers the event bus connectivity and the topic address from asset lineage during start-up. Consequently, it will always wait and retry until this condition is met, and it starts up successfully.


Raise an issue or comment below