top of page

Black-box machine learning: implications for healthcare

With self-driving cars and intelligent robots dominating the headlines, it appears that we are on the cusp of an era in which artificial intelligence (AI) plays a visible role in our lives. Healthcare is no exception to this, and is forecasted to benefit in multiple ways from advances in machine learning, a type of AI. [1]

Machine learning describes algorithms whose purpose is to categorise or predict information by first learning mathematical relationships in existing datasets, and then identifying similar patterns in novel data. [1] In healthcare, this could involve making diagnoses based on medical images, or identifying targeted treatments based on a combination of genomic, molecular, and behavioural data. [2,3] Machine learning is also being applied to areas such as medical monitoring, epidemiology, and healthcare resource management. [4–6]

Despite the talk about superintelligent’, autonomous AI, the machine learning employed in healthcare will act as a tool to aid and refine specific tasks performed by human professionals. [7] Humans and algorithms will work collaboratively, rather than independently. For this collaboration to work, it is necessary to understand who does what and, importantly, why. [7]

Machine learning algorithms use complex mathematical principles to make predictions with a level of speed unobtainable by humans. The logic of their reasoning is often difficult to interpret, even for experts. [8] This raises numerous issues for the integration of machine learning into common practice, particularly in the high-stakes healthcare sector. The interpretability of these algorithms is crucial not only for legal liability and regulatory processes, but also for achieving acceptance of the technology by healthcare professionals and patients alike. [8] As such, it is imperative that we tackle this ‘black-box’ aspect.

Like humans, algorithms make mistakes. In cases of interaction between humans and algorithms, however, the issue of accountability becomes ambiguous. [9] Who should carry the burden of liability: the clinician using the algorithm, the manufacturer who produced it, or the healthcare facility overseeing the practice? Concerns about mistaken algorithms are not hypothetical: last year, a facial recognition system at a US state agency erroneously identified someone as a criminal suspect, automatically revoking their driver’s license. In healthcare, errors can be far costlier, and at present, the legislative frameworks in place are ill-equipped to deal with such scenarios. [9]

The inscrutability of machine learning also discourages wider trust in the technology. If neither patients nor healthcare professionals have the opportunity to understand how an algorithm handles personal data, how it diagnoses, or how it recommends treatments, it may be difficult to achieve acceptance. [8] The newly approved EU ‘General Data Protection Regulation’ recognises this lack of transparency and may help remedy it. [10] In effect as of 2018, this legislation bans ‘decisions based solely on automated processing, including profiling’, which significantly affect individuals. [10,11] This aims to ensure that human oversight is a vital component to any algorithm implementation. Moreover, it mentions the need to provide individuals with ‘meaningful information about the logic involved’ in algorithmic decision-making, often referred to as a ‘right to explanation’ clause. [11]

Similarly, interpretability is crucial for regulatory bodies performing health technology assessments. [8] Currently, there is no specific guidance on the regulatory requirements for machine learning technology in healthcare. [7,8] January 2017 saw the US approval of Arterys, medical imaging software used to help doctors diagnose heart problems. This is the first approval of machine learning software by the Food and Drug Administration, and involved demonstrating that the technology can operate at least as accurately as humans, without any assessment of the algorithm itself. Given that the current medical paradigm often operates around interventions whose mechanisms we know little about, it is reasonable to argue that a demonstration of efficacy and minimal adverse risks is all we need for approval. However, a unique aspect to Arterys and similar technology is that the algorithm will continue to update itself as it encounters new cases, making it unclear how often assessment and approval should be conducted. To address these issues, regulatory bodies must be equipped with combined expertise in machine learning and healthcare. [7,8]

Despite the challenges, researchers are working to open up the technology. A lab from MIT has recently developed an algorithm that can not only output decisions, but also support these with rationales. The use of these self-justifying algorithms in tandem with software like fairML, which audits black-box algorithms, may help to ameliorate some of the issues around transparency. [14,15] However, such work is in its infancy, and there are still questions about how to precisely define, develop, and evaluate ‘interpretability’. [16,17] For example, in the context of accountability, what is more valuable: a breakdown of how raw data is treated and sorted as it is fed into an algorithm, or how this information is analysed by the algorithm? Interpretability requirements also strongly depend on the context, including where (e.g., ICU versus primary care) and by whom (e.g., pathologists versus nurses) the technology will be used. [16]

It is clear that more careful consideration of ethical, legal, and logistic concerns is necessary to ensure safe and effective implementation of machine learning in healthcare. Common to all three aspects is the question of algorithm interpretability. To delineate and tackle this issue, collaboration between data scientists, healthcare professionals, and policymakers is key. Only then will we be able to employ these technological advancements for the greater good of the patient.

Matthew J Reid is a first year PhD student at the University of Oxford specialising in Clinical Neurosciences. He has a background in evidence-based medicine and is interested in the application of innovative technology to healthcare policy and decision-making.


[1] Z. Obermeyer, E.J. Emanuel, Predicting the Future — Big Data, Machine Learning, and Clinical Medicine, N. Engl. J. Med. 375 (2016) 1216–1219. doi:10.1056/NEJMp1606181.

[2] D. Shen, G. Wu, H.-I. Suk, Deep Learning in Medical Image Analysis, Annu. Rev. Biomed. Eng. 19 (2017) null. doi:10.1146/annurev-bioeng-071516-044442.

[3] S.M. Reza Soroushmehr, K. Najarian, Transforming big data into computational models for personalized medicine and health care, Dialogues Clin. Neurosci. 18 (2016) 339–343.

[4] M. McManus, D. Baronov, M. Almodovar, P. Laussen, E. Butler, Novel risk-based monitoring solution to the data overload in intensive care medicine, in: 52nd IEEE Conf. Decis. Control, 2013: pp. 763–769. doi:10.1109/CDC.2013.6759974.

[5] S.I. Hay, D.B. George, C.L. Moyes, J.S. Brownstein, Big Data Opportunities for Global Infectious Disease Surveillance, PLOS Med. 10 (2013) e1001413. doi:10.1371/journal.pmed.1001413.

[6] D. He, S.C. Mathews, A.N. Kalloo, S. Hutfless, Mining high-dimensional administrative claims data to predict early hospital readmissions, J. Am. Med. Inform. Assoc. (n.d.) 272–279. doi:10.1136/amiajnl-2013-002151.

[7] H. Armstrong, Machines That Learn in the Wild: Machine learning capabilities, limitations and implications (Nesta report). (2016). (accessed April 1, 2017).

[8] Science and Technology Committee (House of Commons), report on robotics and artificial intelligence. (2016). (accessed April 1, 2017).

[9] E.K. Chris Reed, Sara Nogueira Silva, Responsibility, Autonomy and Accountability: legal liability for machine learning, Queen Mary University of London Legal Studies Research Paper. (2016).

[10] Overview of the General Data Protection Regulation (GDPR), (2017). (accessed April 1, 2017).

[11] B. Goodman, S. Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation,” ArXiv160608813 Cs Stat. (2016). (accessed March 14, 2017).

[12] J. Burrell, How the machine “thinks”: Understanding opacity in machine learning algorithms, Big Data Soc. 3 (2016) 2053951715622512. doi:10.1177/2053951715622512.

[14] G.J. Katuwal, R. Chen, Machine Learning Model Interpretability for Precision Medicine, ArXiv161009045 Q-Bio. (2016). (accessed March 23, 2017).

[15] A. Vellido, J. Martín-guerrero, P.J.G. Lisboa, Making machine learning models interpretable, in: Proc Eur. Symp. Artif. Neural Netw. Comput. Intell. Mach. Learn., 2012.

[16] F. Doshi-Velez, B. Kim, Towards A Rigorous Science of Interpretable Machine Learning, ArXiv170208608 Cs Stat. (2017). (accessed March 23, 2017).

[17] Z.C. Lipton, The Mythos of Model Interpretability, ArXiv160603490 Cs Stat. (2016). (accessed March 22, 2017).


Commenting has been turned off.
bottom of page