Repository proxy (adapter) polling example using files¶
- Connector Category: Repository and Event Mapper Connectors
- Hosting Service: Local OMRS Repository Connector)
- Hosting Server: repository proxy
- Source Module: sample lineage integration connector
- Jar File Name:
This is a repository proxy implementation that provides an example of bringing in the metadata associated with files in a folder into Egeria.
It showcases a pattern, whereby the OMRS Repository connector
has an embedded Repository connector that caches content. In the past, repository proxies implemented each of the find requests. In this pattern, the OMRS requests are delegated down to the embedded OMRS repository; simplifying development.
If a member of the cohort does not issue federated queries, then it cannot get existing content from a 3rd party repository. This sample gives an example on how an event mapper can be written to poll the 3rd party technology and send a batched event for each asset. The batched event contains
- the asset (representing the file)
- connector type
The OMRS caching repository connector needs to be configured to store the metadata.
The pattern is:
It shows how the event mapper polling loop: - Gets the file info from the file system - Adds the appropriate reference entities and relationships to the repository connector - Finds the entities and relationships per asset - Sends a batched event per asset - Waits for the length of time specified in the refreshTimeInterval configuration parameter. - repeats
A subset of the open types are required for this sample:
Entity types * DataFile * Connection * ConnectorType * Endpoint
Relationship types * ConnectionEndpoint * ConnectionConnectorType * ConnectionToAsset
- You should be familiar with how to setup Egeria.
- You need to decide which embedded repository you will use and ensure that the appropriate jar files are picked up by the OMAG server platform.
- follow the below instruction to configure and run.
The gradle JAR step will include some of the dependencies into the connector JAR, making is a semi-Fat Jar. This makes sure that additional dependencies are automatically deployed together with the connector.
Repository Proxy Connector embedded configuration¶
Configure the event mapper connector¶
Any open metadata repository that supports its own API must also implement an event mapper to ensure the Open Metadata Repository Services (OMRS) is notified when metadata is added to the repository without going through the open metadata APIs.
The event mapper is a connector that listens for proprietary events from the repository and converts them into calls to the OMRS. The OMRS then distributes this new metadata.
connectorProvider should be set to the fully-qualified Java class name for the connector provider, and the
eventSource should give the details for how to access the events (for example, the hostname and port number of an Apache Kafka bootstrap server).
Sample file connector configuration overview¶
Event mapper Endpoint address should be defined with the local folder name to monitor.
|Event mapper configuration parameter name
|This is a prefix for the qualifiedName. This prefix is used on every entity that is created using this connector.
|Poll interval in milliseconds. If null only poll once at connector start time.
An example of how to configure the repository proxy is provided in a postman collection.
Note that the Postman requests are named starting with a number. You should run the posts in numerical order, when you have completed 3, you should decide which embedded connector you want to run, choose either:
- 4a. for in memory repository connector
- 4b. for XTDB. Please ensure that
Verifying it is working¶
- The audit log content shows progress.
- You can see the content of the connector using the Repository Explorer from the Eco-system UI. Be aware that you will need to configure the Rex view service to include the repository proxy server, with an entry in the configuration similar to this (where cocofile is the Server name):
"class" : "ResourceEndpointConfig",
"resourceCategory" : "Server",
"serverInstanceName" : "Caching Repository proxy file sample",
"serverName" : "cocofile",
"platformName" : "Platform1",
"description" : "Caching Repository proxy file sample"
serverName to match your server (the 'server' in the postman collection).
Restrictions and considerations¶
- The normal way that a cohort member would get information about the repository metadata behind a repository proxy would be to issue gueries to the cohort and get add, update and delete information via OMRS events. If federated queries are being issued, then there is often no need to event mapper to poll.
- polling as per this pattern, means that all content is cached into the embedded repository. This may not be desirable if there is a large amount of metadata in the 3rd party technology.
- The batched events contain all the information associated with an asset. If there was a listener listening to the 3rd party technology (the file system here) then the listener could pick up incremental changes and the cache would be kept up to date.
- The batched events could flood the cohort(s) if the interval is too short and there is a lot of metadata.
- An integration connector or standard repository proxy pattern could be preferable for many setups.
- If there is a requirement to write to the 3rd party technology, then the OMRS repository connector would need to be re-implemented as it would need to include code to write to the 3rd party technology.
Reference materials for developers¶
- https://github.com/odpi/egeria/blob/main/open-metadata-implementation/repository-services/README.md and it's sub-pages are great resources for developers.
- Egeria Webinars particularly the one on repository connectors.