By Prasanna Desikan, Ritu Khare, Jaideep Srivastava, Robert Kaplan, Joydeep Ghosh, Longjian Liu, Vipin Gopal
A workshop on Data Mining for Healthcare (DMH) was part of ICHI 2013.This report summarizes the discussions and recommendations of the workshop in general, and the workshop panel discussion in particular. The panel, titled “Predictive modeling in healthcare: Challenges, opportunities, and realities”, had experts including physicians, data scientists, and policy makers, representing the government, healthcare providers and payers, and academics.
The healthcare industry is undergoing a major transformation, necessitated by the triple aim of improved quality, lower costs and better patient outcomes. Passage of the Affordable Care Act (ACA) is poised to increase the adoption of technology to increase healthcare efficiency and safety. Healthcare analytics adoption can occur at various levels, including track and prevention of medical errors, data integration, predictive modeling and personalized modeling. While there has been significant innovation and progress from a data mining and research perspective, challenges and opportunities remain. This report summarizes the viewpoints of the panelists who are visionaries and enthusiasts in the field of predictive modeling.
Analytics: There are many challenges in data collection and management, as arising from incomplete, heterogeneous, incorrect, or inconsistent data. Identification of key data elements based on high utility value (for conducting analysis and enabling actionable research) and feasibility of collection is important for effective data analytics.
Such analytics also need to address the problems of dealing large numbers of potential predictors and rarity of many of the events of interest.
Privacy becomes critical as organizations collaborate and share data to enable an integrated EMR. There is need to preserve privacy while sharing knowledge and to retain most of the utility of aggregated data for purposes of predictive modeling and population studies, while adhering to privacy constraints. The privacy-utility tradeoff becomes even more critical to understand when data from multiple sources is integrated, potentially leading to unintended consequences or leaks.
Technology adoption: The main barriers to acceptance and realization of predictive modeling methods in healthcare could be studied at two levels, namely adoption and system usage. The first is due to a lack of understanding of its usefulness, and lack of alignment with strategic objectives. Clinicians are incredibly busy, and there have been few incentives for them to explore how analytics can make them more effective and efficient. Privacy, and sensitivity issues associated with patient information also limit the data access and sharing capabilities.
The second set of challenges is associated with actual system usage in demanding clinical workflows, systems, and environments. Clinicians face a variety of challenges while interfacing with electronic medical records on a day-to-day basis. Some usability challenges include the lack of support for information integration, data heterogeneity, too many options for each field, lack of visualization of summary information, sharing information in a privacy-safe manner, etc. Although EMRs provide support for entering structured information, clinicians often resort to unstructured notes due to the lack of time and system usability. This leads to natural language processing challenges and to data errors. More importantly, the difficulty in primary usage of EMRs leads to a low confidence in their value, which results in reduced interest in their secondary, and potentially very powerful purpose, namely for predictive clinical analytics.
Lack of participation of data scientists: Clinicians have traditionally relied heavily on biostatisticians for analysis and interpretation. As data collection capabilities change and large amounts of data are collected, there is scope to adopt techniques from large-scale data analytics solutions in other industries. Hence, it is increasingly important to leverage the skills of data scientists while developing analytical solutions. The acceptance and integration of such skill sets has been slow in healthcare as a whole, with the possible exception of payers, although there is a growing realization of the need.
The challenges discussed above, along with other legal and regulatory constraints, make healthcare a slow adopter of technology compared to other fields, such as finance and insurance. While there are many technical challenges that are of interest, the key opportunities include:
Research opportunities: Incorporating domain knowledge and real world evidence to address data quality issues will be crucial to help improve the effectiveness of follow-up predictive modeling efforts. Feature selection techniques will be necessary, as most data in healthcare is of very high dimensions. Smart ensemble methods will play a crucial role in incorporating evidence from different sources. Finally, privacy-aware and knowledge-preserving data collaboration techniques are essential for successful collaboration and for the integration of data from different sources. One of the most important challenges is in harmonizing data elements across data collection systems. Gaining consensus on what is measured may create new opportunities for epidemiological research. In the clinical practice arena, what gets measured may define what problems get clinical attention.
Addressing the needs of clinicians within their workflow: Solutions should focus on making systems smarter and easy for physicians to work with. For example, a system that summarizes key points in a text format and answers questions (like IBM’s Watson or the clinical “Siri”) for clinicians as they look at patient records, will let physicians focus on patient care rather than data recording and data analysis. Useability and effectiveness will drive the adoption of technology.
Interdisciplinary design teams: While there is strong agreement that physicians and data scientists need to work closely and collaborate to design and create effective informatics solutions, it is important to build a more cohesive and collaborative interdisciplinary team comprising clinicians, data scientists, biostatisticians, epidemiologists, policy makers, legal, etc., to design effective solutions. Otherwise most healthcare systems will end up being designed in a narrow framework that doesn’t provide effective solutions, leading to dissatisfaction and lack of belief in the efficacy of such systems.
Predictive modeling in healthcare is at the forefront of improving quality of care, reducing costs, and improving population health (triple aim). It has great potential to drive future models of care and is a key step towards personalized medicine.
For Further Reading
P. Desikan, R. Khare, “Data Mining for Healthcare 2013: Workshop Summary.” International Conference on Health Informatics (ICHI) 2013, September 09-11, 2013, Philadelphia, PA.
Prasanna Desikan is currently Senior Research Scientist at Division of Applied Research, Office of Clinical Excellence, Allina Health. He received his Ph.D in Computer Science from University of Minnesota, Twin Cities, USA. Read more
Ritu Khare is a Research Fellow with the National Center for Biotechnology Information at the National Institutes of Health (NIH). She conducts health informatics research focused on data and text mining, natural language processing, information extraction, and data integration. She earned her Doctorate in Information Science in 2011 from the iSchool at Drexel University, in collaboration with the Drexel University College of Medicine. Read more
Jaideep Srivastava is Professor of Computer Science & Engineering at the University of Minnesota, where he directs a laboratory focusing on research in Web Mining, Social Media Analytics, and Health Analytics. He has a PhD from the University of California, Berkeley. Read more
Robert M. Kaplan, Ph.D. is Associate Director for Behavioral and Social Sciences and Director of the Office of Behavioral and Social Sciences Research (OBSSR) in the National Institutes of Health (NIH) Office of the Director. He is the author, co-author or editor of more than 18 books and over 500 articles or chapters. Read more
Joydeep Ghosh is Joydeep Ghosh is the Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at the University of Texas, Austin. Dr. Ghosh’s research interests lie primarily in data mining and web mining, predictive modeling / predictive analytics and their applications to a wide variety of complex real-world problems, including extracting value from a variety of healthcare data. He received the Ph.D. at The University of Southern California. Read more
Longjian Liu, MD, PhD, MSc, FAHA, is an associate professor of Epidemiology and Biostatistics at Drexel University School of Public Health, and associate professor of medicine at Drexel University College of Medicine. Dr. Liu’s main research covers cardiovascular disease and diabetes prevention, and the usage of hospital electronic medical records to monitor and predict disease risk and outcomes. Read more
Vipin Gopal is the Vice President of Clinical Analytics at Humana, a Fortune 100 company. He is an expert in developing differentiating analytic competencies, and has previously led analytic functions in diverse companies ranging from industrial conglomerates to healthcare. Dr. Gopal obtained his Ph.D. from Carnegie Mellon University. Read more