Patient Data Sharing Hub¶
Robbie Records has decided to take action. He works for Hampton Hospital and is responsible for the centralized Patient Data Records. He receives a lot of requests for patient data, beyond the normal day-to-day uses when treating patients. These additional requests come from the medical staff working at the hospital as part of their research projects, as well as pharmaceutical companies that are working with the hospital on clinical trials.
He currently has to manage these requests by hand, which means he often takes at least a week to fulfill a request, and sometimes he has to refuse them when he is overwhelmed with other work. He wants to automate the process so that he can focus on other tasks and improve the data sharing experience.
At the same time, the hospital has strict policies on patient confidentiality, security and privacy. He is audited regularly and needs to ensure that he is both implementing these policies AND can demonstrate to an auditor that all is working as expected.

How is patient data managed?¶
The patient data that Robbie managed is largely in a number of PostgreSQL databases plus some media files for medical images. There is a portal for the staff to view and update a particular patient's data. When a patient is being treated, new data from the various medical systems and equipment is added to the patient's data. This may arrive as files or as a stream of data that is fed into PostgreSQL. The PostgreSQL server, file system and portal is managed by the hospital's IT department, so Robbie is not responsible for them. As a result, they are backed up, upgraded and secure.
When Robbie is asked to share patient data with a third party, he creates a new database for their project and copies the data from the original database. This copying is needed to be able to demonstrate exactly what data was shared with a particular requestor.
He maintains a set of documents in a directory (folder) on the filesystem - one for each project requesting data. The documents describe the purpose of the project, how the data will be used and which database has the copy of the data that was shared.
To actually share the data, Robbie typically exports the data into CSV files that are then zipped up with a password and delivered to where the team can access it. The password is generated by a secure password generator and is only shared with the team that requested the data.
When Robbie is working with an external third party, such as a pharmaceutical company performing a clinical trial, there are often additional requirements, specified by the data sharing contract between the hospital and the pharmaceutical company. Robbie can perform simple transformations using SQL commands, but needs to get help from the department that generated the data if data values are outside acceptable ranges.
Robbie's vision¶
Robbie would like to create a patient data sharing hub that allows an in-house data requester to manage (create/view/amend) their data sharing request, including specifying the data they require and the purpose they will use it for. They may also need to agree to certain conditions on how the data must be managed once they have their copy.
Robbie also wants to handle data sharing requests from external third parties. In this case, the third party will not have direct access to the patient data sharing hub, but Robbie should be able to manage their request and requirements, as well as keeping track of any data sent to them.
Using Egeria¶
Robbie saw a presentation about the latest version of Egeria at a conference and decided to try it out. Egeria provides a robust platform for managing metadata and governance, which will be crucial for tracking and managing data sharing requests and ensuring compliance with regulations and contracts.

He downloads egeria-workspaces from GitHub and installs it on his local machine. He then starts customizing it for the Patient Data Sharing Hub.
Building a Data Dictionary for the Patient Data Sharing Hub¶
The first step is to build an inventory of the patient data fields he has on offer. He needs a data dictionary that data requesters can select from.
Robbie begins by requesting Egeria performs a survey of the patient data and catalogues its schema. This process is called metadata discovery.
He then requests Egeria creates a skeleton data dictionary for the patient data sharing hub based on the database schema.
The skeleton data dictionary will be complete, in terms of the data fields and their types, but lacking in descriptions that explain the values, where they come from and what they mean. He will need to add these descriptions to the data dictionary.
Robbie requests an update FORM from Egeria for all the fields in the data dictionary. This form is a Markdown document with a section for each field. It is a big task, but interesting. Many of the fields, Robbie is familiar with, but he discovered other data he was not aware of. He shares the Markdown document with collegues who help him to fill out some the details he is not sure of.
The document itself sparks interest and discussion. A number of people request a copy when it is finished.
Robbie loads the Markdown document into Egeria so that all the descriptions are added to the data dictionary in the open metadata repository. He uses Egeria's portal to check the descriptions were added correctly. He also notices that each field in the data dictionary is linked to the corresponding schema elements representing the tables and columns in the patient database. This linkage could allow an automated pipeline to navigate from the data fields requested by a data requestor to the actual data in the database.
He requests an HTML report of the data dictionary and installs it in Egeria's webserver. It creates a nice website for the data dictionary that is easy to read and navigate. This could be ta site used by future data requestors to find out what data is available. However, it is only accessible through his machine at the moment. He is not sure about the security of opening it up to others. He puts the idea to one side for now. Instead, he requests a Markdown document REPORT of the completed data dictionary.
Robbie is pleased with the result and shares it with his colleagues. They are impressed with the level of detail and the quality of the descriptions. They also appreciate the effort that went into creating the document. He also shows them the website and they agree that would be useful too.
Creating a data sharing request form¶
Now Robbie needs to design a form that data requesters fill in to describe the data they want and the purpose for it.
Creating the solution blueprint¶
Processing a data sharing request from a member of the hospital medical staff¶
Processing a data sharing request from a third party¶
The IT Team arrive ...¶
Once the Patient Data Sharing Hub is up and running and Robbie is managing data sharing requests from medical staff and external third parties, the hospital's IT team hear about it. They visit Robbie to express their concerns at him running his own system. This is a critical moment for Robbie's new system. With the manual system, auditing was always a major headache since he had to reconstruct the specific data sharing process every time he needed to demonstrate compliance with regulations and contracts.
Egeria's portal includes reports that enable Robbie to demonstrate how the system works at a general level and then dive into specific cases. There are details on the user's with access and exactly what they have access to. He can show the data sharing agreements for each request and exactly what data was shared.
The system is containerized to limit the vectors for a security breach. The IT team examines the setup and agrees it looks secure, and he has covered the privacy and security requirements well. However, there are no backups and they wondered if Robbie was willing to manage all of the patching and upgrading of the system over time. They would like to perform some security scans on the software and perform some penetration tests too.
The IT team offered to take over the maintenance and backup of the system. Robbie would retain admin access to allow him to make improvements. The IT team would monitor the hub and ensure that it remained up-to-date and backed up. They also offered to securely host his website containing the solution descriptions and data dictionary.
This seemed a reasonable agreement to Robbie - in fact, just perfect :).
Conclusion¶
With the patient data sharing hub in place, Robbie can now manage his data sharing requests and ensure that they are compliant with regulations and contracts.
His reputation grew in the hospital as a person who understood how to manage data. He was invited to research planning meetings and asked to take ownership of more data sources. Some of the research staff petitioned for more funding for Robbie so he could expand the types of data on offer. This resulted in:
- A digital product catalog for popular data sets.
- A glossary of terms to share understanding of complex concepts and data values.
- Hosting of research project data, removing the need for many of the research teams to manage their own copies of the data.

Raise an issue or comment below