Saturday 21 November 2020

On the regulatory validation of AI models


Accepted epistemology suggests that a theory cannot be confirmed, since this would require infinite tests, but only falsified. So, we can never say a theory is true, but only that it has not disproved so far.  However, falsification attempts are not done randomly, they are purposely crafted to seek for all possible weaknesses. So, the process empirically works: all theories that resisted falsification for some decades were not falsified subsequently, at most extended. For example, special relativity theory addressed the special case of bodies travelling close to the speed of light but did not truly falsified the second law of dynamics.  In fact, if you write Newton’s law as F = dp/dt, where p is the momentum, even the case where the mass varies due to relativistic effects is included.

But once a theory has resisted extensive attempt of falsification, for all practical purposes we assume it is true, and use it to make predictions.  However, our predictions will be affected by some error, not because the theory is false, but because of how we use it to make predictions.  An accepted approach suggests that the prediction error of a mechanistic model can be described as the sum of the epistemic error (due to our imperfect application of the theory to the physical reality being predicted), aleatoric error (due to the uncertainty affecting all the measured quantities we use to inform the model), and numerical solution errors, present only when the equations that describe the model as solved numerically.  For mechanistic models, based on theories that resisted extensive falsification, validation means simply the quantification of the prediction errors, ideally in all its three components (which gives raise to the verification, validation and uncertainty quantification (VV&UQ) process).

A phenomenological model is defined as a predictive model that does not use any prior knowledge to make predictions, but only prior observations (data). Analytical AI models are a type of phenomenological models. When we talk about validation for a phenomenological model, we do not simply mean the quantification of the prediction error; since the phenomenological model contains an implicit theory, its validation is more related the falsification of a theory. And while an explicitly formulated theory can be purposely attacked in our falsification attempts, the implicit nature of phenomenological models forces us to use brute force approaches to falsification.  This bring us to the curse of induction: a phenomenological model is never validated, we can only say that with respect to the validation sets we used to challenge it so far, the model resisted our falsification attempts. But in principle nothing guarantees us that at the next validation set the model will not be proven totally wrong.

Following this line of thoughts, one would conclude that locked AI models cannot be trusted.  The best we can do is to formulate AI testing as a continuous process; as new validation sets are produced the model must be retested again and again, and at most we will say it is valid “so far”.

But the world is not black and white.  For example, while purely phenomenological models do exist, purely mechanistic models do not. A simple way to prove this is to consider that since the space-time resolution of our instruments is finite, every mechanistic model has some limits of validity imposed by the particular space-time scale at which we model the phenomenon of interest.  The second law of dynamics is not strictly valid anymore at the speed of light, and it also shakes at the quantum scale.  To address this problem, virtually EVERY mechanistic model must include two phenomenological portions that describe everything bigger than our scale as boundary conditions, and everything smaller than our scale as constitutive equation. All this is to say that there are black box models and grey box models, but not white box models. At most light grey models.  So what?  Well, if every model includes some phenomenological portion, in theory VV&UQ cannot be applied for the arguments above. But VV&UQ works, and we trust our life on airplanes and nuclear power stations because it works.

Which bring us to another issue. Above I wrote: “in principle nothing guarantees us that at the next validation set the model will not be proven totally wrong”. Well, this is not true.  If the phenomenological model is predicting a physical phenomenon, we can postulate some properties.  One very important that come from the conservation principles is that all physical phenomena show so degree of regularity. If Y varies with X, and for X =1, Y =10, and for X = 1.0002 Y = 10.002, when X = 1.0001, we can safely state that it is impossible that Y = 100,000, or 0.0003.  Y must have a value in the order of 10, because the inherent regularity of physical processes.  Statisticians recognise this from another perspective (purely phenomenological) by saying that any finite sampling of  a random variable might be non-normal, but if we add them eventually the resulting distribution will be normal (central limit theorem).  This means that my estimate of an average value will converge asymptotically to the true average value, associated to an infinite sample size. 

Thus, we can say that the estimate of the average prediction error of a phenomenological model as we increase the number of validation sets will converge to the true average prediction error asymptotically.  This means that if the number of validation sets is large enough the value of the estimate will change monotonically, and also its derivative will decrease monotonically. This makes possible to reliably estimate an upper boundary of the prediction error, even with a finite number of validation sets.

We are too early in the day to come to any conclusion, on if and how the credibility of AI-based predictors can be evaluated from a regulatory point of view.  Here I tried to show some aspects of the debate. But personally, I am optimistic: I believe we can reliably estimate the predictive accuracy of all models of physical processes, including those purely phenomenological.

A final word of caution:  this is definitely not true when the AI model is trying to predict a phenomenon affected by non-physical determinants. Predictions involving psychological, behavioural, sociological, political or economic factors cannot rely on the  inherent properties of physical systems, and thus in my humble opinion such phenomenological models can never be truly validated.  I can probably validate an AI-based model that predict the walking speed from the measurement of the acceleration of the centre of mass of the body, but not a model that predict whether a subject will go out walking today.


Saturday 25 April 2020

Fight the good fight: the wind raises

It is a long time I do not post anything to my private blog.  Restarting from scratch in Bologna, after then seven years in Sheffield, kept me quite busy in the last period.  But yesterday I felt the need, once again, to express some private ideas.

The trigger was that I watched an anime, an animation movie, written and directed by  Hayao Miyazaki: the wind raises. This 2013 movie tells the story of a young Japanese aeronautic engineer, who develops new amazing airplanes while his country prepares for WWII, and his wife dies of tuberculosis.  It is a beautiful story, told with absolute grace and total lack of rhetorics.

but the real trigger was a single scene, where the main character travels by train to reach his young wife whose condition's turned for the worse. While he travels, worried sick for his beloved, he cries on the sheets of the calculations he is doing for his new, also beloved, airplane.

There are three themes, distinct but entangled, that developed in my head while I was watching.

The first is the fortune to have a true calling.  To warn my students of the risks our work poses in term of mental health, I always insist that greatness requires obsession, but obsession damages your life; any researcher needs to find a balance between these two.  But watching The Wind Rises reminded me the many times in my life, when my heart was broken, my science was always there for me, ready to absorb me entirely, taking me away from mundane pains. As Jiro (the main character) travel he cries, his pains are not forgotten, but he keeps working with his slide rule (slipstick) to finish his calculations.  And of all callings, being a true engineer is an amazing one. Gianni Caproni, the Italian aeronautic engineer, says in one of Jiro's dreams: “But remember this, Japanese boy... airplanes are not tools for war. They are not for making money. Airplanes are beautiful dreams. Engineers turn dreams into reality.”

The second theme is the social responsibility of engineers. Jiro understands that his beautiful plane will feed the imperialistic expansionism of the Samurai class, but somehow separates himself from this, he is turning his dream into reality, what other people will do with that is not his responsibility. Today is April 25th, which in Italy is the anniversary of the liberation from nazi-fascism.  WWII claimed 70-80 million lives; beside Hiroshima and Nagasaki, we could remember the bombing of Tokyo in 1944, which killed over 100,000 persons, mostly burned alive. If you visit Tokyo, go the Edo-Tokyo Museum, there is a whole section on this horror.  You cannot be a good engineer if you are not a humanist, who trust humanity will eventually put at good use our discoveries.  But when the link of your research with military applications is so evident, as it is the intention of your government to use it for aggression, I think we should not forget that in addition the moral obligation to our dreams, we also have those due to our being citizens and humans.  So let me say it with one syllable words: I believe Jiro (both the fictional character, and the real Jiro Horikoshi, who designed the  Mitsubishi A6M Zero fighters) was wrong to continue his work knowing what he knew.

The third, totally unrelated (or maybe not) is a reflection on Tuberculosis (TB).  To date the coronavirus has killed nearly 200,000 people worldwide.  I could not find any serious projection, but let us say that one year after its start we will have three times this number, say 600,000 death.  The world is mobilised, every single research funding agency is rerouting money to support Covid-19 research.  But what about TB, which kills over 1.8 million people worldwide every year?  I am all for this renewed attention for communicable diseases, but please do not forget those that have been around for a while, only because they are not common in developed countries. in 2016 Malaria costed 63 million DALY (Disability Adjusted Life Years), HIV 59, TB 45, and other communicable disease 23.  This is nearly 200 million years of life lost to these diseases.  TB is a horrible disease, that infects you but remains silent until when you are weaker or older, and then it strikes you down.  The Bacille Calmette-GuĂ©rin vaccine has been around since 1921, but this has not eradicated the disease in many countries.

To quote the Bard, we few, we happy few, we band of brothers: scientists and engineers of the world, we need be happy for our calling, because we turn dreams into reality.  But we must channel this calling for the good of humanity, and when there is the risk that our discoveries can be used unethically, we must say no.  Instead, we have to turn our dreams, our creative energies, toward fighting the good fight, for example getting rid of ALL communicable diseases affecting humans anywhere in the world.