Unraveling the deep learning gearbox in optical coherence tomography image segmentation towards explainable artificial intelligence

Samuel, A. L. in Computer Games I (ed. Levy D.N.L.) 366–400 (Springer New York, 1988).

Fletcher, K. H. Matter with a mind; a neurological research robot. Research 4, 305–307 (1951).

CAS
PubMed

Google Scholar

Kononenko, I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23, 89–109 (2001).

CAS
PubMed
Article

Google Scholar

Kugelman, J. et al. Automatic choroidal segmentation in OCT images using supervised deep learning methods. Sci. Rep. 9, 13298 (2019).

PubMed
PubMed Central
Article
CAS

Google Scholar

Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation 234–241 (Springer International Publishing, Cham, 2015).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

CAS
PubMed
PubMed Central
Article

Google Scholar

Müller, P. L. et al. in High Resolution Imaging in Microscopy and Ophthalmology: New Frontiers in Biomedical Optics (ed. Bille, J. F.) 87–106 (Springer International Publishing, 2019).

Huang, D. et al. Optical coherence tomography. Science 254, 1178–1181 (1991).

CAS
PubMed
PubMed Central
Article

Google Scholar

Mrejen, S. & Spaide, R. F. Optical coherence tomography: imaging of the choroid and beyond. Surv. Ophthalmol. 58, 387–429 (2013).

PubMed
Article

Google Scholar

10.

Staurenghi, G., Sadda, S., Chakravarthy, U. & Spaide, R. F. Proposed lexicon for anatomic landmarks in normal posterior segment spectral-domain optical coherence tomography. Ophthalmology 121, 1572–1578 (2014).

11.

von der Emde, L. et al. Artificial intelligence for morphology-based function prediction in neovascular age-related macular degeneration. Sci. Rep. 9, 11132 (2019).

PubMed
PubMed Central
Article
CAS

Google Scholar

12.

Lee, C. S., Baughman, D. M. & Lee, A. Y. Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol. Retina 1, 322–327 (2017).

PubMed
PubMed Central
Article

Google Scholar

13.

Motozawa, N. et al. Optical coherence tomography-based deep-learning models for classifying normal and age-related macular degeneration and exudative and non-exudative age-related macular degeneration changes. Ophthalmol. Ther. 8, 527–539 (2019).

PubMed
PubMed Central
Article

Google Scholar

14.

Keel, S. et al. Feasibility and patient acceptability of a novel artificial intelligence-based screening model for diabetic retinopathy at endocrinology outpatient services: a pilot study. Sci. Rep. 8, 4330 (2018).

PubMed
PubMed Central
Article
CAS

Google Scholar

15.

Bellemo, V. et al. Artificial intelligence screening for diabetic retinopathy: the real-world emerging application. Curr. Diab. Rep. 19, 72 (2019).

PubMed
Article

Google Scholar

16.

Grzybowski, A. et al. Artificial intelligence for diabetic retinopathy screening: a review. Eye 34, 451–460 (2020).

PubMed
Article

Google Scholar

17.

Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

Article
PubMed

Google Scholar

18.

Arcadu, F. et al. Deep learning algorithm predicts diabetic retinopathy progression in individual patients. NPJ Digit. Med. 2, 92 (2019).

PubMed
PubMed Central
Article

Google Scholar

19.

Waldstein, S. M. et al. Evaluating the impact of vitreomacular adhesion on anti-VEGF therapy for retinal vein occlusion using machine learning. Sci. Rep. 7, 2928 (2017).

PubMed
PubMed Central
Article
CAS

Google Scholar

20.

Schlegl, T. et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 125, 549–558 (2018).

PubMed
Article

Google Scholar

21.

Zutis, K. et al. Towards automatic detection of abnormal retinal capillaries in ultra-widefield-of-view retinal angiographic exams. In Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. Vol. 2013, 7372–7375 (Osaka, Japan, 2013).

22.

Müller, P. L. et al. Prediction of function in ABCA4-related retinopathy using Ensemble machine learning. J. Clin. Med. 9, 2428 (2020).

PubMed Central
Article
CAS

Google Scholar

23.

De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

PubMed
Article
CAS

Google Scholar

24.

Maloca, P. M. et al. Validation of automated artificial intelligence segmentation of optical coherence tomography images. PLoS ONE 14, e0220063 (2019).

CAS
PubMed
PubMed Central
Article

Google Scholar

25.

Quellec, G. et al. Feasibility of support vector machine learning in age-related macular degeneration using small sample yielding sparse optical coherence tomography data. Acta Ophthalmol. 97, e719–e728 (2019).

CAS
PubMed
Article

Google Scholar

26.

Darcy, A. M., Louie, A. K. & Roberts, L. W. Machine learning and the profession of medicine. JAMA 315, 551–552 (2016).

CAS
PubMed
Article

Google Scholar

27.

Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 1–47 (2018).

28.

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

CAS
PubMed
Article

Google Scholar

29.

King, B. F. Artificial intelligence and radiology: what will the future hold? J. Am. Coll. Radiol. 15, 501–503 (2018).

PubMed
Article

Google Scholar

30.

Coiera, E. The fate of medicine in the time of AI. Lancet 392, 2331–2332 (2018).

PubMed
Article

Google Scholar

31.

Jha, S. & Topol, E. J. Adapting to artificial intelligence: radiologists and pathologists as information specialists. JAMA 316, 2353–2354 (2016).

PubMed
Article

Google Scholar

32.

Makridakis, S. The forthcoming artificial intelligence (AI) revolution: its impact on society and firms. Futures 90, 46–60 (2017).

Article

Google Scholar

33.

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

CAS
PubMed
Article

Google Scholar

34.

Chan, S. & Siegel, E. L. Will machine learning end the viability of radiology as a thriving medical specialty? Br. J. Radiol. 92, 20180416 (2019).

PubMed
Article

Google Scholar

35.

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

CAS
PubMed
Article

Google Scholar

36.

Ferrucci, D., Levas, A., Bagchi, S., Gondek, D. & Mueller, E. T. Watson: beyond jeopardy! Artif. Intell. 199–200, 93–105 (2013).

Article

Google Scholar

37.

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).

PubMed
Article

Google Scholar

38.

Bouwmeester, W. et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 9, 1–12 (2012).

PubMed
Article

Google Scholar

39.

Collins, G. S. & Moons, K. G. M. Reporting of artificial intelligence prediction models. Lancet 393, 1577–1579 (2019).

PubMed
Article

Google Scholar

40.

Schulz, K. F., Altman, D. G., Moher, D. & CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 340, c332 (2010).

41.

Calvert, M. et al. Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: the SPIRIT-PRO Extension. JAMA 319, 483–494 (2018).

PubMed
Article

Google Scholar

42.

CONSORT-AI and SPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat. Med. 25, 1467–1468 (2019).

Article
CAS

Google Scholar

43.

Liu, X., Faes, L., Calvert, M. J., Denniston, A. K. & CONSORT/SPIRIT-AI Extension Group. Extension of the CONSORT and SPIRIT statements. Lancet 394, 1225 (2019).

44.

Kaiser, T. M. & Burger, P. B. Error tolerance of machine learning algorithms across contemporary biological targets. Molecules 24, https://doi.org/10.3390/molecules24112115 (2019).

45.

Beam, A. L., Manrai, A. K. & Ghassemi, M. Challenges to the reproducibility of machine learning models in health care. JAMA 323, 305–306 (2020).

PubMed
PubMed Central
Article

Google Scholar

46.

Ting, D. S. W. et al. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 103, 167–175 (2019).

PubMed
Article

Google Scholar

47.

Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).

PubMed
Article

Google Scholar

48.

Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).

CAS
PubMed
Article

Google Scholar

49.

Guidotti, R. et al. A survey of methods for explaining black box models. ACM Comput. Surv. 51, 1–42 (2019).

Article

Google Scholar

50.

Lipton, Z. C. The mythos of model interpretability. Queue 16, 31–57 (2018).

Article

Google Scholar

51.

Gunning, D. & Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40, 44–58 (2019).

Article

Google Scholar

52.

Holzinger, A., Kieseberg, P., Weippl, E. & Tjoa, A. M. Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI. In Machine Learning and Knowledge Extraction. CD-MAKE 2018. Lecture Notes in Computer Science, Vol 11015, 1–8 (eds. Holzinger, A. et al.) (Springer, Cham., 2018). https://doi.org/10.1007/978-3-319-99740-7_1.

53.

Barredo Arrieta, A. et al. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).

Article

Google Scholar

54.

Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018).

Article

Google Scholar

55.

Holzinger, A., Langs, G., Denk, H., Zatloukal, K. & Müller, H. Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Discov. 9, e1312. https://doi.org/10.1002/widm.1312 (2019).

56.

Holzinger, A., Carrington, A. & Müller, H. Measuring the quality of explanations: The System Causability Scale (SCS). KI K.ünstliche Intell. 34, 193–198 (2020).

Article

Google Scholar

57.

Ribeiro, M. T., Singh, S. & Guestrin, C. Why Should I. Trust You? In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).

58.

Lakkaraju, H., Kamar, E., Caruana, R. & Leskovec, J. Interpretable & explorable approximations of black box models. Preprint at https://arxiv.org/abs/1707.01154 (2017).

59.

Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. https://doi.org/10.1007/s11263-019-01228-7 (2016).

60.

Wickstrom, K., Kampffmeyer, M. & Jenssen, R. Uncertainty modeling and interpretability in convolutional neural networks for polyp segmentation. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) 1–6 (IEEE, 2018).

61.

Vinogradova, K., Dibrov, A. & Myers, G. Towards Interpretable semantic segmentation via gradient-weighted class activation mapping. Preprint at https://arxiv.org/abs/2002.11434 (2020).

62.

Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015).

PubMed
PubMed Central
Article
CAS

Google Scholar

63.

Seegerer, P. et al. Interpretable deep neural network to predict estrogen receptor status from haematoxylin-eosin images. in Artificial Intelligence and Machine Learning for Digital Pathology (eds. Holzinger, A. et al.) 16–37 (Springer, Cham, 2020).

64.

Montavon, G., Lapuschkin, S., Binder, A., Samek, W. & Müller, K.-R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 65, 211–222 (2017).

Article

Google Scholar

65.

Kim, B. et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). in Proceedings of the 35th International Conference on Machine Learning, Vol. 80 (eds Jennifer, D. & Andreas, K.) 2668–2677 (PMLR, Proceedings of Machine Learning Research, 2018).

66.

Moussa, M. et al. Grading of macular perfusion in retinal vein occlusion using en-face swept-source optical coherence tomography angiography: a retrospective observational case series. BMC Ophthalmol. 19, 127 (2019).

PubMed
PubMed Central
Article

Google Scholar

67.

Swanson, E. A. & Fujimoto, J. G. The ecosystem that powered the translation of OCT from fundamental research to clinical and commercial impact [Invited]. Biomed. Opt. Express 8, 1638 (2017).

PubMed
PubMed Central
Article

Google Scholar

68.

Holz, F. G. et al. Multi-country real-life experience of anti-vascular endothelial growth factor therapy for wet age-related macular degeneration. Br. J. Ophthalmol. 99, 220–226 (2015).

PubMed
Article

Google Scholar

69.

Alshareef, R. A. et al. Segmentation errors in macular ganglion cell analysis as determined by optical coherence tomography in eyes with macular pathology. Int. J. Retin. Vitr. 3, 25 (2017).

Article

Google Scholar

70.

Al-Sheikh, M., Ghasemi Falavarjani, K., Akil, H. & Sadda, S. R. Impact of image quality on OCT angiography based quantitative measurements. Int. J. Retina Vitreous 3, 13 (2017).

PubMed
PubMed Central
Article

Google Scholar

71.

Sadda, S. R. et al. Errors in retinal thickness measurements obtained by optical coherence tomography. Ophthalmology 113, 285–293 (2006).

PubMed
Article

Google Scholar

72.

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015).

73.

Sinz, F. H., Pitkow, X., Reimer, J., Bethge, M. & Tolias, A. S. Engineering a less artificial intelligence. Neuron 103, 967–979 (2019).

CAS
PubMed
Article

Google Scholar

74.

Zador, A. M. A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10, 3770 (2019).

PubMed
PubMed Central
Article
CAS

Google Scholar

75.

Tajmir, S. H. et al. Artificial intelligence-assisted interpretation of bone age radiographs improves accuracy and decreases variability. Skelet. Radio. 48, 275–283 (2019).

Article

Google Scholar

76.

Kellner-Weldon, F. et al. Comparison of perioperative automated versus manual two-dimensional tumor analysis in glioblastoma patients. Eur. J. Radiol. 95, 75–81 (2017).

PubMed
Article

Google Scholar

77.

Ma, Z., Turrigiano, G. G., Wessel, R. & Hengen, K. B. Cortical circuit dynamics are homeostatically tuned to criticality in vivo. Neuron 104, 655–664.e4 (2019).

CAS
PubMed
PubMed Central
Article

Google Scholar

78.

Shibayama, S. & Wang, J. Measuring originality in science. Scientometrics 122, 409–427 (2020).

Article

Google Scholar

79.

Dirk, L. A measure of originality. Soc. Stud. Sci. 29, 765–776 (1999).

Article

Google Scholar

80.

Hägele, M. et al. Resolving challenges in deep learning-based analyses of histopathological images using explanation methods. Sci. Rep. 10, 6423 (2020).

PubMed
PubMed Central
Article
CAS

Google Scholar

81.

Panwar, H. et al. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos Solitons Fractals 140, 110190 (2020).

PubMed
PubMed Central
Article

Google Scholar

82.

Anger, E. M. et al. Ultrahigh resolution optical coherence tomography of the monkey fovea. Identification of retinal sublayers by correlation with semithin histology sections. Exp. Eye Res. 78, 1117–1125 (2004).

CAS
PubMed
Article

Google Scholar

83.

Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010).

Google Scholar

84.

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization CoRR abs/1412.6980 (2015).

85.

Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. https://doi.org/10.1016/j.media.2017.07.005 (2017).

86.

Kosub, S. A note on the triangle inequality for the Jaccard distance arXiv:1612.02696 (2016).

87.

Borg, I. & Groenen, P. Modern Multidimensional Scaling (Springer New York, 1997).

88.

R Core Team. R: A Language and Environment for Statistical Computing (2019).

89.

Fay, M. P. & Shaw, P. A. Exact and asymptotic weighted logrank tests for interval censored data: the interval R package. J. Stat. Softw. 36 (2010).

90.

Maloca, M. P. et al. Unraveling the deep learning gearbox in optical coherence tomography image segmentation towards explainable artificial intelligence. Code/software v1.0. https://doi.org/10.5281/zenodo.4380269 (2020).

Hannah