Posts Categorized:

Predictive Modeling in Healthcare: Challenges and Opportunities

By Prasanna Desikan, Ritu Khare, Jaideep Srivastava, Robert Kaplan, Joydeep Ghosh, Longjian Liu, Vipin Gopal

A workshop on Data Mining for Healthcare (DMH) was part of ICHI 2013.This report summarizes the discussions and recommendations of the workshop in general, and the workshop panel discussion in particular. The panel, titled “Predictive modeling in healthcare: Challenges, opportunities, and realities”, had experts including physicians, data scientists, and policy makers, representing the government, healthcare providers and payers, and academics.

Motivation

The healthcare industry is undergoing a major transformation, necessitated by the triple aim of improved quality, lower costs and better patient outcomes. Passage of the Affordable Care Act (ACA) is poised to increase the adoption of technology to increase healthcare efficiency and safety. Healthcare analytics adoption can occur at various levels, including track and prevention of medical errors, data integration, predictive modeling and personalized modeling. While there has been significant innovation and progress from a data mining and research perspective, challenges and opportunities remain. This report summarizes the viewpoints of the panelists who are visionaries and enthusiasts in the field of predictive modeling.

Challenges

Analytics: There are many challenges in data collection and management, as arising from incomplete, heterogeneous, incorrect, or inconsistent data. Identification of key data elements based on high utility value (for conducting analysis and enabling actionable research) and feasibility of collection is important for effective data analytics.

Such analytics also need to address the problems of dealing large numbers of potential predictors and rarity of many of the events of interest.

Privacy becomes critical as organizations collaborate and share data to enable an integrated EMR. There is need to preserve privacy while sharing knowledge and to retain most of the utility of aggregated data for purposes of predictive modeling and population studies, while adhering to privacy constraints. The privacy-utility tradeoff becomes even more critical to understand when data from multiple sources is integrated, potentially leading to unintended consequences or leaks.

Technology adoption: The main barriers to acceptance and realization of predictive modeling methods in healthcare could be studied at two levels, namely adoption and system usage. The first is due to a lack of understanding of its usefulness, and lack of alignment with strategic objectives. Clinicians are incredibly busy, and there have been few incentives for them to explore how analytics can make them more effective and efficient. Privacy, and sensitivity issues associated with patient information also limit the data access and sharing capabilities.

The second set of challenges is associated with actual system usage in demanding clinical workflows, systems, and environments. Clinicians face a variety of challenges while interfacing with electronic medical records on a day-to-day basis. Some usability challenges include the lack of support for information integration, data heterogeneity, too many options for each field, lack of visualization of summary information, sharing information in a privacy-safe manner, etc. Although EMRs provide support for entering structured information, clinicians often resort to unstructured notes due to the lack of time and system usability. This leads to natural language processing challenges and to data errors. More importantly, the difficulty in primary usage of EMRs leads to a low confidence in their value, which results in reduced interest in their secondary, and potentially very powerful purpose, namely for predictive clinical analytics.

Lack of participation of data scientists: Clinicians have traditionally relied heavily on biostatisticians for analysis and interpretation. As data collection capabilities change and large amounts of data are collected, there is scope to adopt techniques from large-scale data analytics solutions in other industries. Hence, it is increasingly important to leverage the skills of data scientists while developing analytical solutions. The acceptance and integration of such skill sets has been slow in healthcare as a whole, with the possible exception of payers, although there is a growing realization of the need.

Opportunities

The challenges discussed above, along with other legal and regulatory constraints, make healthcare a slow adopter of technology compared to other fields, such as finance and insurance. While there are many technical challenges that are of interest, the key opportunities include:

Research opportunities: Incorporating domain knowledge and real world evidence to address data quality issues will be crucial to help improve the effectiveness of follow-up predictive modeling efforts. Feature selection techniques will be necessary, as most data in healthcare is of very high dimensions. Smart ensemble methods will play a crucial role in incorporating evidence from different sources. Finally, privacy-aware and knowledge-preserving data collaboration techniques are essential for successful collaboration and for the integration of data from different sources. One of the most important challenges is in harmonizing data elements across data collection systems. Gaining consensus on what is measured may create new opportunities for epidemiological research. In the clinical practice arena, what gets measured may define what problems get clinical attention.

Addressing the needs of clinicians within their workflow: Solutions should focus on making systems smarter and easy for physicians to work with. For example, a system that summarizes key points in a text format and answers questions (like IBM’s Watson or the clinical “Siri”) for clinicians as they look at patient records, will let physicians focus on patient care rather than data recording and data analysis. Useability and effectiveness will drive the adoption of technology.

Interdisciplinary design teams: While there is strong agreement that physicians and data scientists need to work closely and collaborate to design and create effective informatics solutions, it is important to build a more cohesive and collaborative interdisciplinary team comprising clinicians, data scientists, biostatisticians, epidemiologists, policy makers, legal, etc., to design effective solutions. Otherwise most healthcare systems will end up being designed in a narrow framework that doesn’t provide effective solutions, leading to dissatisfaction and lack of belief in the efficacy of such systems.

Conclusions

Predictive modeling in healthcare is at the forefront of improving quality of care, reducing costs, and improving population health (triple aim). It has great potential to drive future models of care and is a key step towards personalized medicine.

For Further Reading

P. Desikan, R. Khare, “Data Mining for Healthcare 2013: Workshop Summary.” International Conference on Health Informatics (ICHI) 2013, September 09-11, 2013, Philadelphia, PA.

Contributors

Prasanna Desikan is currently Senior Research Scientist at Division of Applied Research, Office of Clinical Excellence, Allina Health. He received his Ph.D in Computer Science from University of Minnesota, Twin Cities, USA. Read more

Ritu Khare is a Research Fellow with the National Center for Biotechnology Information at the National Institutes of Health (NIH). She conducts health informatics research focused on data and text mining, natural language processing, information extraction, and data integration. She earned her Doctorate in Information Science in 2011 from the iSchool at Drexel University, in collaboration with the Drexel University College of Medicine. Read more

Jaideep Srivastava is Professor of Computer Science & Engineering at the University of Minnesota, where he directs a laboratory focusing on research in Web Mining, Social Media Analytics, and Health Analytics. He has a PhD from the University of California, Berkeley. Read more

Robert M. Kaplan, Ph.D. is Associate Director for Behavioral and Social Sciences and Director of the Office of Behavioral and Social Sciences Research (OBSSR) in the National Institutes of Health (NIH) Office of the Director. He is the author, co-author or editor of more than 18 books and over 500 articles or chapters. Read more

Joydeep Ghosh is Joydeep Ghosh is the Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at the University of Texas, Austin. Dr. Ghosh’s research interests lie primarily in data mining and web mining, predictive modeling / predictive analytics and their applications to a wide variety of complex real-world problems, including extracting value from a variety of healthcare data. He received the Ph.D. at The University of Southern California. Read more

Longjian Liu, MD, PhD, MSc, FAHA, is an associate professor of Epidemiology and Biostatistics at Drexel University School of Public Health, and associate professor of medicine at Drexel University College of Medicine. Dr. Liu’s main research covers cardiovascular disease and diabetes prevention, and the usage of hospital electronic medical records to monitor and predict disease risk and outcomes. Read more

Vipin Gopal is the Vice President of Clinical Analytics at Humana, a Fortune 100 company. He is an expert in developing differentiating analytic competencies, and has previously led analytic functions in diverse companies ranging from industrial conglomerates to healthcare. Dr. Gopal obtained his Ph.D. from Carnegie Mellon University. Read more

Aniruddha Datta received the Ph.D. degree from the University of Southern California in 1991. In August 1991, he joined the Department of Electrical and Computer Engineering at Texas A&M University where he is currently the J. W. Runyon, Jr. '35 Professor II. His areas of interest include adaptive control, robust control, PID control and Genomic Signal Processing. Read more

Christopher C. Yang is an associate professor in the College of Computing and Informatics at Drexel University. He received his PhD in computer engineering from the University of Arizona. His recent research interests include healthcare informatics, social intelligence and technology, Web search and mining, knowledge management, and information visualization. Read more

Rebecca Chiu heads up Business Development for MedHelp, the leading online social media and mobile health platform. She works with strategic partners to help them engage patients and personalize their services, as well as building the partner ecosystem for MedHelp's platform service. Ms. Chiu holds an M.A. and a B.A. in Economics from Yale University and an M.B.A. from The Wharton School. Read more

Simon Lin is Director of the Biomedical Informatics Research Center at Marshfield Clinic Research Foundation and he holds the Dr. John Melski Endowed Physician Scientist at Marshfield Clinic. Dr Lin received an MD degree in Medical Informatics at the School of Medicine, Peking University, Peking. Read more

Akhil Kumar is a Professor of Information Systems at the Smeal College of Business at Penn State University. He received his Ph.D. from the University of California, Berkeley. His research interests are in healthcare IT, business process management systems, process mining and web services. Read more

Joydeep Ghosh is Joydeep Ghosh is the Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at the University of Texas, Austin. Dr. Ghosh's research interests lie primarily in data mining and web mining, predictive modeling / predictive analytics and their applications to a wide variety of complex real-world problems, including extracting value from a variety of healthcare data. He received the Ph.D. at The University of Southern California. Read more

Longjian Liu, MD, PhD, MSc, FAHA, is an associate professor of Epidemiology and Biostatistics at Drexel University School of Public Health, and associate professor of medicine at Drexel University College of Medicine. Dr. Liu's main research covers cardiovascular disease and diabetes prevention, and the usage of hospital electronic medical records to monitor and predict disease risk and outcomes. Read more

Christian Seeger is currently a Ph.D. student in the Databases and Distributed Systems group led by Prof. Alejandro Buchmann at TU Darmstadt. His research interests are middleware approaches and applications for on-body and ambient sensor networks. Read more

Kristof Van Laerhoven obtained his Ph.D. at Lancaster University (UK) He heads the Embedded Sensing Systems lab at the TU Darmstadt (Germany), funded by the Emmy Noether Programme of the German research foundation DFG. His research combines sensing systems with pattern recognition and machine learning, to obtain adaptive and power-efficient systems. Read more

IEEE Life Sciences

About the Newsletter

November 2013 Contributors