Data is Not a Gold Mine

Why Data Engineering is the Future of Healthcare

By Sundip Gorai, Chief Data Officer, GM of Data, AI & Analytics

From Finance to Marketing and Education, digitalization has disrupted long-established practices and perspectives, bringing data to the foreground. The world of healthcare is no exception, displaying significant and positive progress in implementing innovative technologies. With digital technologies collecting patient records and helping manage hospital performance, massive volumes of information self-generate daily, offering a “goldmine” to support decision-making, improve patient outcomes and reduce healthcare costs. That metaphor is so familiar that we no longer question it.

However, data is not gold per se; it’s what you make of it. Its power is undeniable, but with great power comes a price; proliferating data prompts new engineering challenges that must be tackled to help organizations thrive.

01 Sep 2023

Big Data is a Big Deal

Healthcare is full of valuable data. The most common sources of Big Data in healthcare include electronic health and medical records, personal health records and data generated by widespread digital health tools such as wearable medical devices and mobile health apps. Every patient, test, scan, diagnosis, treatment plan, medical trial, prescription and final health outcome produces a data point that can help improve how we deliver care in the future. Whether structured or unstructured, data always require intelligence to show the insights, trends and patterns they conceal to the bare eye due to their excessive volume and format (text, images, graphics or video).

Whether research organizations and companies, for-profit and not-for-profit, scientists, doctors, insurances and pharmaceutical companies, Big Data interests many players in the healthcare world, unleashing dramatic medical progress.

Monitor populations and guide public health policies
The use of Big Data enables us to understand better patients, healthcare consumption and the health of the population in general. National health agencies process and analyze data from surveillance systems, surveys, and medico-administrative databases to support and guide public health policies. Thanks to Big Data, it is becoming easier and more effective to monitor communities’ knowledge, behavior and attitudes to health to steer public action, keep an eye on numerous pathologies, their evolution, and detect unexpected health events.

Improving disease prevention and management
It is now possible to use multidimensional data collected over the long term on large populations to identify risk factors for certain diseases, such as cancer, diabetes, asthma and neurodegenerative diseases. These factors help develop prevention messages and set up programs targeting at-risk populations.

Big Data also enables the development of diagnostic assistance systems and tools for personalized treatment based on processing large masses of individual clinical data. Big data also help organizations verify the effectiveness of treatment. For example, in vaccines, thanks to Artificial Intelligence, immunologists now measure hundreds of parameters during clinical trials: cell counts, cell functionality, and expression of genes of interest whereas a few years ago, they had to limit themselves to the concentration of antibodies of interest.

Predicting epidemics
Having access to such a vast source of information on the state of health of individuals in a given region makes it possible to pinpoint any rise in the incidence of disease or risky behavior and to alert the health authorities. Researchers use these data to carry out modelling and propose appropriate health measures. The HealthMap automated electronic information system, for example, aims to predict the occurrence of epidemics using data from a wide range of sources. Developed by American epidemiologists and computer scientists, the site works by collecting disparate data sources from health departments and public bodies, official reports, and Internet data. All this is continuously updated to identify health threats and alert populations. Open data is a leap forward in how we tackle global disease outbreaks.

Analyze drug use and assess risks
Big Data also help scientific groups to conduct pharmaco-epidemiological studies that provide information on the use, misuse, efficacy and risks of medicines. In addition, the analysis of long-term data from cohorts or medico-economic databases can enable healthcare professionals and scientists to observe many phenomena, mainly to make connections between treatments and health events and warn of specific risks or harmful interactions. According to research published in the Journal of the American College of Cardiology, researchers can identify and confirm previously unknown drug interactions by coupling data mining of adverse event reports and electronic health records with targeted laboratory experiments.

The Challenges Ahead for Big Data in Healthcare

The medico-economic management of healthcare establishments, public health decisions and even biomedical research increasingly rely on the exploitation of massive data. However, collecting and using such data still poses several technical challenges and ethical questions.

Sufficient storage capacity
The enormous volumes of available data raise technical challenges regarding storage and exploitation capacities. Research organizations have storage servers and supercomputers, sometimes pooled to cut costs.

Standardizing data
Another problem is the disaggregation of such massive amounts of data. The information collected is increasingly heterogeneous because of the following:

  • Its various natures: genomic, physiological, biological, clinical, social
  • Its various formats: text, numerical values, signals, 2D and 3D images, genomic sequences
  • Its various information systems: healthcare establishments, research laboratories, public databases

Standardization is essential to process appropriately and exploit such complex information before integrating it into databases or data warehouses. Informatics for Integrating Biology and the Bedside offer such standards. They enable care centres to compile all the data collected in biomedical data warehouses, which researchers can query via web interfaces. During the Covid-19 pandemic, these standards enabled scientists to exploit data from electronic patient records and provide common data models, providing healthcare professionals with up-to-date clinical and epidemiological information.

Protecting personal data
In the US, HIPAA regulates data collection in electronic health records to guarantee rights for each individual to protect the collection of data concerning them, whether during surveys, studies or on the internet, as well as to its sharing in cases where they have authorized its collection.
Anonymization is a treatment that consists in using a set of techniques in such a way as to make it impossible, in practice, to identify a person by any means whatsoever and in an irreversible manner.

As anonymization and re-identification techniques evolve regularly, any data controller must keep a regular watch to ensure they protect the anonymous nature of the data produced over time. This monitoring must consider the technical means available, as well as other sources of data which may make it possible to remove the anonymity of information.

In Conclusion

In the healthcare sector, Big Data refers to all available health data collected from various sources in the broadest sense. This data offers a better understanding of the healthcare system, to identify risk factors for disease, to help diagnose, select and monitor the effectiveness of treatments, and to support pharmacovigilance and epidemiology. The added value for healthcare professionals, care centres and patients is indubitable but comes with logistical and ethical challenges.

To create accessible and actionable business intelligence through complex datasets, you need to deploy digital automation and artificial intelligence services. If you or your organization needs help with this, contact our healthcare team today.