15-08 Artificial Intelligence

Artificial Intelligence (AI) is a term that has replaced some earlier key words, but overlaps with a lot of other terms. It's a description of a concept on every­one's lips, yet everyone seems to mean something different.

To define it clearly, AI can be classified according to its goals:

(1) Strong AI which aims to build machines that think;

(2) Cognitive Simulation — here computers are used to test theories about how the hu­man mind works — for example, theories about how people re­cog­nize faces or re­call me­mo­ries; and

(3) Applied AI, also known as advanced information processing, which aims to pro­duce com­mer­ci­al­ly vi­able 'smart' systems — for ex­amp­le, expert medical dia­gno­sis systems such as super­vised or un­super­vised computer assisted de­tec­tion or dia­gno­sis (CAD) [⇒ Alaux 1990] or machine (deep) learning (ML) [⇒ Mon­tag­non 2020].

Software designs offered for medical imaging are not genuine AI, but rather basic or sophi­sti­cated CAD or ML systems. Machine learn­ing is con­cern­ed with the question of how to construct computer programs that au­to­mat­ical­ly imp­rove with ex­pe­ri­ence. Their aim in ra­dio­logy is that more rou­tine imaging, in­clud­ing dia­gno­sis and re­port­ing, be done in an auto­ma­ted way.

For this purpose four pre­re­qui­sites must be met:

data of sufficient quantity and quality,
a powerful algorithm,
a narrowly defined task area,
a concrete goal to be achieved.

Of the four prerequisites, sufficient amounts of data will be easily available; how­ever, its quality is and will remain im­pre­cise, in­ad­equate, and often ir­re­pro­duc­ible as described for instance by Lloret [⇒ Lloret 2021]:

“One of the problems comes from the variability of the data itself (e.g., contrast, resolution, signal-to-noise) which make the Deep Learning models suffer from a poor generalization when the training data come from different machines (different vendor, model, etc.) with different acquisition parametrization or any underlying com­po­nent that can cause the data dis­tri­bu­tion to shift.”

More so, it is well known that effects created by the equipment can be subtly yet sig­ni­fi­cant­ly affect machine learning [⇒ Ferrari 2020]. This holds for both quan­ti­fi­ca­tion and detection, the most common AI/ML applications that prospective vendors apply for approval to the FDA. We have discussed the pitfalls of such quantifications earlier [⇒ Rinck 2017; 2021].

Suitable algorithms will be obtainable — yet each AI vendor is producing its in­di­vi­du­al AI variants. Some will be better than others and all will presumably deliver — slightly or dis­tinct­ly — different results.

As far as the task areas and concrete goals are concerned, there will be dozens, perhaps hundreds of different soft­wares for dif­fe­rent or­gans or dia­gno­stic ques­tions. There won't be one general algorithm based on training datasets for the whole human body — with all the va­ri­a­tions from chil­dren to old people [⇒ Gauriau 2021] and co­ver­ing suf­fi­cient geo­graphic loca­tions re­pre­sent­ing di­verse co­horts [⇒ Kaushal 2020, ⇒ Wu 2021].

Critical Remarks. At the end, the software should be able to draw inferences re­le­vant to the solution of the particular task or situation. Often validation of the CAD and ML systems is missing [⇒ Rinck 2018; 2019]. As one example for many, a group from the University of Cambridge scrutinized several thousand papers pub­li­shed during the COVID pandemic and concluded:

“Despite the huge efforts of researchers to develop machine learning models for COVID-19 dia­gnosis and pro­gnosis, we found me­tho­do­lo­gical flaws and many bia­ses throug­hout the lite­rature, leading to highly op­ti­mis­tic reported per­for­man­ce [⇒ Roberts 2021].”

Reliable validation will take decades; it seems nearly impossible, because the para­me­ters of most digital radiological examinations are not exactly reproducible. How­ever, extremely thorough validation must take place before AI algorithms are cli­ni­cal­ly feasible.

AI in medicine and particularly in medical imaging, has long slipped out of de­pend­able scientists' control. Looking at the publications and talks at meetings, there are more unqualified than qualified contributions. Similar to the frenzied hype with func­ti­o­nal imaging (fMRI) that led to some 40,000 fMRI papers of ‘ques­ti­on­able va­li­di­ty’ [⇒ Eklund 2016], it is to be feared that the way applied AI is used in medical imag­ing carries an analogous risk.