rtificial Intelligence (AI) is a term that has replaced some earlier key words, but overlaps with a lot of other terms. It's a description of a concept on everyone's lips, yet everyone seems to mean something different.
To define it clearly, AI can be classified according to its goals:
(1) Strong AI which aims to build machines that think;
(2) Cognitive Simulation — here computers are used to test theories about how the human mind works — for example, theories about how people recognize faces or recall memories; and
(3) Applied AI, also known as advanced information processing, which aims to produce commercially viable ‘smart’ systems — for example, expert medical diagnosis systems such as supervised or unsupervised computer assisted detection or diagnosis (CAD) [⇒ Alaux 1990] or machine (deep) learning (ML) [⇒ Montagnon 2020].
Software designs offered for medical imaging are not genuine AI, but rather basic or sophisticated CAD or ML systems. Machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. Their aim in radiology is that more routine imaging, including diagnosis and reporting, be done in an automated way.
For this purpose four prerequisites must be met:
data of sufficient quantity and quality,
a powerful algorithm,
a narrowly defined task area,
a concrete goal to be achieved.
Of the four prerequisites, sufficient amounts of data will be easily available; however, its quality is and will remain imprecise, inadequate, and often irreproducible as described for instance by Lloret [⇒ Lloret 2021]:
“One of the problems comes from the variability of the data itself (e.g., contrast, resolution, signal-to-noise) which make the Deep Learning models suffer from a poor generalization when the training data come from different machines (different vendor, model, etc.) with different acquisition parametrization or any underlying component that can cause the data distribution to shift.”
More so, it is well known that effects created by the equipment can be subtly yet significantly affect machine learning [⇒ Ferrari 2020]. This holds for both quantification and detection, the most common AI/ML applications that prospective vendors apply for approval to the FDA. We have discussed the pitfalls of such quantifications earlier [⇒ Rinck 2017; 2021].
Suitable algorithms will be obtainable — yet each AI vendor is producing its individual AI variants. Some will be better than others and all will presumably deliver — slightly or distinctly — different results.
As far as the task areas and concrete goals are concerned, there will be dozens, perhaps hundreds of different softwares for different organs or diagnostic questions. There won’t be one general algorithm based on training datasets for the whole human body — with all the variations from children to old people [⇒ Gauriau 2021] and covering sufficient geographic locations representing diverse cohorts [⇒ Kaushal 2020, ⇒ Wu 2021].
Critical Remarks. At the end, the software should be able to draw inferences relevant to the solution of the particular task or situation. Often validation of the CAD and ML systems is missing [⇒ Rinck 2018; 2019]. As one example for many, a group from the University of Cambridge scrutinized several thousand papers published during the COVID pandemic and concluded:
“Despite the huge efforts of researchers to develop machine learning models for COVID-19 diagnosis and prognosis, we found methodological flaws and many biases throughout the literature, leading to highly optimistic reported performance [⇒ Roberts 2021].”
AI in medicine and particularly in medical imaging, has long slipped out of dependable scientists' control. Looking at the publications and talks at meetings, there are more unqualified than qualified contributions. Similar to the frenzied hype with functional imaging (fMRI) that led to some 40,000 fMRI papers of ‘questionable validity’ [⇒ Eklund 2016], it is to be feared that the way applied AI is used in medical imaging carries an analogous risk.