Skip to content

Building the intelligent enterprise

Every week we hear of new tools, data platforms and opportunities for organizations to embrace advanced digital technologies such as artificial intelligence. Yet despite investment and the focus of smart people, few organizations succeed in making wide and systematic use of their data.

Today's IT is at the heart of the problem.

Many teams use tools and data platforms that recognize the value of metadata, but this metadata is managed in a siloed, proprietary way that assumes each tool/platform is the sole guardian of this key resource. The result is that knowledge is not shared across teams that use different tool sets, creating artificial barriers to sharing and collaboration.

What is Egeria?

Egeria is an open source project dedicated to enabling teams to collaborate by making metadata, and related business context, both open and interoperable between tools and platforms, no matter which vendor they come from.

It provides a set of unified capabilities that can be flexibility deployed to support a small team, or incrementally scaled-up to cover the whole enterprise. It embraces both greenfield and brownfield development; and the more common mixture of the two :). It scales from a Raspberry Pi to a multi-tenant enterprise-grade deployment. Its distributed nature makes it cloud-native; locally deployable and distributable across multiple cloud platforms and on-premises. Egeria puts you in the driving seat.

Ready to run Egeria?

Egeria Workspaces is the best way to run Egeria if you are new to this technology. It offers a preconfigured, containerized environment that you can quickly download and run. Once running, Egeria workspaces has a Jupyter notebook, command line and a Markdown environment for activating Egeria's solutions and configuring them to work with your organization's digital resources. There is also a web server, Apache Kafka Event Bus, an Open Lineage Proxy and a PostgreSQL server to play with.

Each major capability of Egeria is demonstrated through Jupyter Notebooks, helping you to understand and apply Egeria to your organization's needs as quickly as possible. As you become familiar with Egeria, you can activate additional runtimes such as Unity Catalog, Apache Atlas, Apache Airflow and Apache Superset to make use of the integration between Egeria and these runtimes. Egeria workspaces are set up to run Egeria's solutions. If you are looking for something different, Egeria's patterns describe how the commonly useful capabilities of Egeria can be consumed.

In addition to Egeria Workspaces, deployment options include:

To find out If you want to build you own Egeria deployment, consider the Planning Guide.

Why do we focus on metadata?

The term metadata can be misleading. For those with a database background, it conjures up images of database schemas, tables and columns. The photographer sees it as the properties captured by their camera to record their camera settings, location and date/time.

The Egeria community agrees with these definitions, but takes it much further. We see it as the description of the structure and linkages of people, process, technology and time. It has been suggested that we should use the term meta-information, but that is likely to create greater confusion :).

To help, we captured the range of metadata we work with in Categories of Metadata. You will notice that it covers the who, what, why, where, when and how of an organization's operation, and most importantly, the linkage between. This is to ensure an organization builds a knowledge base that provides the foundation of its ability to flex to meet the challenges of the future.

With that idea in place, you are ready to understand why we devote our free time to building this technology.

Open metadata and governance manifesto

Our guiding beliefs, listed below, were formed when the project started in 2019, and beyond the odd phrasing tweak, it has remained the same:

  • The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business. Similarly, metadata should be used to drive the exploitation of data and create a business-friendly logical interface to the data landscape.
  • The availability of metadata management must become ubiquitous across all data platforms, regardless of deployment environment, so that the processing engines on these platforms can rely on its availability and build capability around it.
  • Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata.
  • Wherever possible, discovery and maintenance of metadata has to be an integral part of all tools that access, change and move information.

The project is constantly evolving. As a community, we have needed to invent new techniques to enable heterogeneous metadata capture and interoperability. We have taken self-service to a new level and created options for capturing knowledge from subject-matter experts who hate traditional browser-based UIs.

Given this experience, we would add that delivering on this manifesto takes attention to detail (because that is where the devil is), a willingness to embrace new standards and technologies, and the strength of conviction to forge a new path away from the approach of the traditional metadata catalogs.

Want to join the Egeria Community?

We are an open friendly group, interesting to hear from you. We have developer work in Java, python and UI development. We are working on enabling Egeria for AI. In addition, we have roles for writers, advocates and those willing to contribute their knowledge of tricky problems in this space, experience of techniques that worked, and other war stories. Alternatively, if you just like solving complex problems, please give us some consideration.

The open metadata ecosystem

The content of the data/metadata shared between teams needs to follow standards that ensure clarity, both in meaning and how it should be used and managed. Its completeness and quality need to be appropriate for the organization's uses. These uses will change over time.

The ecosystem that supplies and uses this data/metadata must evolve and adapt to the changing and growing needs of the organization, because trust is required not just for today's operation but also into the future.

You can make your own choices on how to build trust in your data/metadata. Egeria provides standards, mechanisms and practices built from industry experiences and best practices that help in the maintenance of data/metadata:

  • Egeria defines a standard format for storing and distributing metadata. This includes an extendable type system so that any type of metadata that you need can be supported.

  • Egeria provides technology to manage, store, distribute this standardized metadata. This technology is inherently distributed, enabling you to work across multiple cloud platforms, data centres and other distributed environments. Collectively, a deployment of this technology is referred to as the open metadata ecosystem.

  • Egeria provides connector interfaces to allow third party technology to plug into the open metadata ecosystem. These connectors translate metadata from the third party technology's native format to the open metadata format. This allows:

    • Collaboration
    • Blending automation and manual processes
    • Comprehensive security and privacy controls
  • Egeria's documentation provides guidance on how to use this technology to deliver business value.

Summary - What is included in Egeria?

Egeria provides Apache 2.0 licensed standards and technology to support the deployment of the open metadata ecosystem. This is augmented with reference data, connectors and services to provide out-of-the-box solutions to common data management and governance problems.


Raise an issue or comment below