Templated cataloging¶
When a new resource is catalogued, the asset catalog entry of a similar resource can be used as a template to set up the asset for the new resource. The copying process extends to every entity, and its linking relationships that are anchored to the asset template, plus relationships to other entities outside of the anchoring scope. This means that the new asset can contain governance metadata attachments, not just the technical metadata extracted from the digital resource.
Templated cataloguing can be used for any type of element, not just assets. It has the advantages that:
- The resulting catalogued elements are much more consistent with one another making it easier for people to understand and work with.
- The increased consistency simplifies automation.
- Much of the complexity involved in creating a new catalog entry is hidden from the cataloguing user.
- Required classifications and governance relationships can be guaranteed to be included in each catalog entry.
An example¶
Peter Profile is responsible for cataloguing the weekly patient measurements supplied by the various hospitals as part of a clinical trial. These measurements come with with certain terms and conditions (also known as a license) that Coco Pharmaceuticals must not only adhere to, but prove that they are doing so. For that reason, when the measurements are catalogued, the asset for the measurements data set is linked to the license as well as other elements that help to ensure that the measurements data sets are appropriately used and governed.
Figure 1 shows Peter making calls to Egeria to catalog the first set of measurements received for the clinical trial. This includes an asset to represent the data set that is linked to the license along with a connection to allow the data scientist to connect to the data set and access the data and the schema showing the structure of the data in the data set. The data fields identified in the schema each link to the glossary term that describes the meaning of the data stored in the field. There are also two classifications on the asset:
- AssetZoneMembership - The governance zones that the asset is a member of. This controls who can access the asset and its related metadata elements such as the connection and the schema.
- Ownership - The owner of the data set. This is the person who is accountable for ensuring that Coco Pharmaceuticals adheres to the license.
Figure 1: In week 1, Peter manually creates the asset and links it to the governance elements needed to ensure the data set is used and protected as laid out in the license.
Without templating, Peter would need to issue the same sequence of (30+) requests to catalog each of the weekly results from each of the hospitals. This is a lot of work from Peter, particularly as the number of clinical trials, and participating hospitals rises. He may then make a mistake and forget one of the steps in the cataloguing process.
What if the catalog entry for the Week 1 measurements could be used as a template for cataloguing the subsequent weeks' measurements as shown in figure 2?
Figure 2: For subsequent weeks, the week 1 entry could be used as a template for cataloguing subsequent weeks. The result is an asset for each data set with a connection, a schema along with the ownership and zone membership classifications. All the assets are linked to the license and the data fields in each schema are linked to the correct glossary terms.
This is the idea behind templated cataloguing. A template that includes the common settings for a set of digital resources is defined and this template is used when cataloguing these resources.
Figure 3 shows a set of templates used by Coco Pharmaceuticals when cataloguing their digital landscape. There are different templates for different types of digital resources. Each would include the classifications and relationships that are relevant for the resources that they catalog. They are decorated with the Template
classification to identify that they do not represent real digital resource and should be used as templates.
Figure 3: A set of templates defined to use when cataloguing digital resources
When a template is used in cataloguing a digital asset, the caller needs to supply the values that must be unique for the digital asset. This is typically the qualifiedName
, displayName
, description
and may also include the networkAddress
for its connection's endpoint. These values override those in the template.
Egeria uses the anchor classification to determine which elements linked to the template are duplicated and which elements are just linked to by the new catalog entry. In figure 2, for example, the connection and schema are anchored to the asset whilst the glossary terms and license are not. This means that copies of the connection and schema elements are made for the new catalog entry whilst the glossary terms and and licence just receive new relationships to the new catalog entry.
Finally, when a template is used, it is linked to the resulting element with the SourcedFrom
relationship. This makes it easier to identity the elements that need changing if the template needs to be corrected or enhanced at a later date.
Figure 4: The
SourcedFrom
relationship links a template to the elements that are created from it
The scope of a template¶
The scope of a template is controlled by the Anchors classification. In the picture below, the elements in blue represent the scope of the template. The elements in white show the new elements created from the template. The relationships to the elements in green are duplicated during the templating process, but the green elements themselves are not duplicated.
Ths means it is possible to navigate from, say the license, to each of the assets bound by the terms of the license.
Support for templated cataloguing¶
The Template Manager OMVS provides the ability to set up templates and the Automated Curation OMVS provides the ability to locate and use templates to catalog new assets.
Adding automation¶
It is also possible to use templates in the integration connectors running during Integrated cataloguing. The connector is typically passed the qualified names of the templates that it should use in the CatalogTarget relationship.
Raise an issue or comment below