TwinTree Insert

15-08 Artificial Intelligence


rtificial Intelligence (AI) is a term that has replaced some earlier key words, but overlaps with a lot of other terms. It's a description of a concept on every­one's lips, yet everyone seems to mean something different.

To define it clearly, AI can be classified according to its goals:

spaceholder darkblue(1) Strong AI which aims to build machines that think;

spaceholder darkblue(2) Cognitive Simulation — here computers are used to test theories about how the hu­man mind works — for example, theories about how people re­cog­nize faces or re­call me­mo­ries; and

spaceholder darkblue(3) Applied AI, also known as advanced information processing, which aims to pro­duce com­mer­ci­al­ly vi­able ‘smart’ systems — for ex­amp­le, expert medical dia­gno­sis systems such as super­vised or un­super­vised computer assisted de­tec­tion or dia­gno­sis (CAD) [⇒ Alaux 1990] or machine (deep) learning (ML) [⇒ Mon­tag­non 2020].


Software designs offered for medical imaging are not genuine AI, but rather basic or sophi­sti­cated CAD or ML systems. Machine learn­ing is con­cern­ed with the question of how to construct computer programs that au­to­mat­ical­ly imp­rove with ex­pe­ri­ence. Their aim in ra­dio­logy is that more rou­tine imaging, in­clud­ing dia­gno­sis and re­port­ing, be done in an auto­ma­ted way.

For this purpose four pre­re­qui­sites must be met:

spaceholder darkbluedata of sufficient quantity and quality,
spaceholder darkbluea powerful algorithm,
spaceholder darkbluea narrowly defined task area,
spaceholder darkbluea concrete goal to be achieved.


spaceholder redOf the four prerequisites, sufficient amounts of data will be easily available; how­ever, its quality is and will remain im­pre­cise, in­ad­equate, and often ir­re­pro­duc­ible as described for instance by Lloret [⇒ Lloret 2021]:


“One of the problems comes from the variability of the data itself (e.g., contrast, resolution, signal-to-noise) which make the Deep Learning models suffer from a poor generalization when the training data come from different machines (different vendor, model, etc.) with different acquisition parametrization or any underlying com­po­nent that can cause the data dis­tri­bu­tion to shift.”


More so, it is well known that effects created by the equipment can be subtly yet sig­ni­fi­cant­ly affect machine learning [⇒ Ferrari 2020]. This holds for both quan­ti­fi­ca­tion and detection, the most common AI/ML applications that prospective vendors apply for approval to the FDA. We have discussed the pitfalls of such quantifications earlier [⇒ Rinck 2017; 2021].

spaceholder redSuitable algorithms will be obtainable — yet each AI vendor is producing its in­di­vi­du­al AI variants. Some will be better than others and all will presumably deliver — slightly or dis­tinct­ly — different results.

spaceholder redAs far as the task areas and concrete goals are concerned, there will be dozens, perhaps hundreds of different soft­wares for dif­fe­rent or­gans or dia­gno­stic ques­tions. There won’t be one general algorithm based on training datasets for the whole human body — with all the va­ri­a­tions from chil­dren to old people [⇒ Gauriau 2021] and co­ver­ing suf­fi­cient geo­graphic loca­tions re­pre­sent­ing di­verse co­horts [⇒ Kaushal 2020, ⇒ Wu 2021].


spaceholder redCritical Remarks. At the end, the software should be able to draw inferences re­le­vant to the solution of the particular task or situation. Often validation of the CAD and ML systems is missing [⇒ Rinck 2018; 2019]. As one example for many, a group from the University of Cambridge scrutinized several thousand papers pub­li­shed during the COVID pandemic and concluded:


“Despite the huge efforts of researchers to develop machine learning models for COVID-19 dia­gnosis and pro­gnosis, we found me­tho­do­lo­gical flaws and many bia­ses throug­hout the lite­rature, leading to highly op­ti­mis­tic reported per­for­man­ce [⇒ Roberts 2021].”


Reliable validation will take decades; it seems nearly impossible, because the para­me­ters of most digital radiological examinations are not exactly reproducible. How­ever, extremely thorough validation must take place before AI algorithms are cli­ni­cal­ly feasible.


AI in medicine and particularly in medical imaging, has long slipped out of de­pend­able scientists' control. Looking at the publications and talks at meetings, there are more unqualified than qualified contributions. Similar to the frenzied hype with func­ti­o­nal imaging (fMRI) that led to some 40,000 fMRI papers of ‘ques­ti­on­able va­li­di­ty’ [⇒ Eklund 2016], it is to be feared that the way applied AI is used in medical imag­ing carries an analogous risk.