Skip to content

June 2024

Welcome to the Egeria community's June 2024 newsletter. This is our first newsletter this year. Since our last newsletter, the community has been heads down on version 5.0.

Version 5.0 reflects a shift in Egeria's strategy. Egeria is still essentially an open metadata exchange and governance capability. However, its initial remit was to supply libraries to vendors to include in their tools products so that these products could exchange metadata with one another.

The take-up of Egeria by vendors has been slow, but the challenge of linking the myriad of technologies deployed in many organizations is still all to prevalent in organizations, particularly as the use of data and AI grows, along with new regulations, hybrid-cloud deployments and staff turn-over. Therefore, we have shifted our focus to developing and packaging our technology for people that want to use Egeria. Version 5.0 is the first step on that path, with its focus on simplification, and supporting users with the technologies they are the most comfortable working with. Most notable for V5.0 is the introduction of pyegeria - a comprehensive python package for open metadata and governance.

The community has also continued to innovate. Whilst it does not aim to replace current data catalogs - it is still challenging the traditional approaches to metadata management through its focus on open formats for metadata, metadata distribution, linking metadata with master data and reference data, and using metadata in decision-making. To this end we have introduced a survey framework to help organizations decide which data and systems are valuable enough to add to their metadata catalogs. This aims to speed up the cataloguing process, and avoid having useless content clogging up their metadata catalogs.

Finally, there is the standard code clean up and enhancements to existing features that is typically for any software release.

Enjoy ...

Survey before catalog

For organizations starting a data or system cataloguing initiative, the temptation is to "catalog everything" and sift through the results afterwards, using the UI of the metadata catalog. However, metadata ingestion into a catalog can take a lot of time, and results in a bloated catalog where the valuable assets are lost in a sea of irrelevant content.

Instead, imagine having a survey report of one of your file systems that gives you an overview of its content, classified by type and use. You then run more detailed survey reports on specific directories, or types of files, helping you to pinpoint where the interesting data is located. The automated cataloguing is directed to this area, rapidly ingesting metadata about the most valuable digital resources into your catalog.

Automated cataloguing is ongoing, constantly picking up new resources, as they are created and changed. Surveys are similar. They can be configured to run on a regular schedule, so that they capture a picture of how the resources are changing over time. This is useful for the catalog users and may identify new areas of value in your IT landscape.

Egeria's new survey action framework provides the mechanisms to support pluggable survey action services that connect to a specific technology and produces a survey report. In version 5.0 we have developed a variety of survey action services for file systems and PostgreSQL databases as a way to validate the framework. No doubt in future release will include additional survey action services for other types of technology. There is also an SDK for developing your own survey action services to plug into Egeria.

First-class support for python programmers

Python is growing in popularity, particularly amongst data professionals. Egeria's runtime is written in Java and provides REST APIs and Java clients that are used for calling Egeria. In release 5.0 we have introduced a set of python libraries for interacting with Egeria called pyegeria. This is a python library that calls selective Egeria REST APIs. In particular, it supports:

  • The operational services for configuring, starting and stopping servers - and querying their status.
  • The view services to work with open metadata and governance services. These services allow you to query and maintain open metadata, and configure and run open governance services such as surveys and automated cataloguing.

Pyegeria also provides monitoring widgets.

Pyegeria can be downloaded and installed from PyPI

Ready to go

Version 5.0 comes with pre-built configuration for its servers and connectors, so you can simply download, build and run Egeria in three easy steps. The pre-built configuration is sufficient to run most of Egeria's features, either as part of an evaluation, or providing your first deployment. This is backed up by a new blog series called Getting Started with Egeria.

There is also a script for building a customized Docker image to make it easy to customize Egeria for your own container-base deployment.

Data (Digital) Products

Egeria has a new REST API called Collection Manager OMVS that supports building a digital product catalog that is organized into a hierarchy of folders. Egeria uses the term digital product rather than data product because such products can include APIs as well as data collections.

Cataloguing earth observation data

The Earth Observation professional have traditionally organized satellite data as collections of related data sets - just like a digital product. In fact the approach used by earth observation catalogs such as USGS and the Spatio-Temporal Asset Catalog (STAC) standard was used as input into the design of Egeria's digital product support.

As a result of working with this type of data, we have extended the Open Metadata Types to allow geospatial information, such as elevation and a bounded box (bbox), to be attached to a data set description. This is implemented in the DataScope classification.

Egeria Explorers

Egeria has a new graphical website for exploring the open metadata types.

This website is built form the brain technology. Thank you to Pragmatic Data Research Ltd for hosting the website.

Full release notes for V5.0

The full release notes for version 5.0 can be found on Egeria's website.

Future plans

Release 5.1 will continue to build out new view services supported by pyegeria, and connectors, such as survey action services, to increase its out-of-the-box connectivity.

There is also work on a new text-based UI for supporting data professionals in their work.

Connecting with the project

With our new focus on Python and supporting data practitioners directly, the community is looking for new contributors to the project. If these topics interest you, then please contact us and we can help you get started.

Connecting with the community

Go to our community guide to find out more about the activities of the Egeria project nd ho to get engaged.

Raise an issue or comment below