Tuesday 14 June 2022

Credibility of predictive Data-driven vs Knowledge-driven models: a layperson explanation

In science the concept of truth has a meaning quite different than its colloquial use.  Scientists observe a natural phenomenon and formulate different hypotheses on why things happen in the way we observe them. There are many ways to formulate such hypotheses, but the preferred one is to express them in quantitative, mathematical terms, which makes it easier to test whether they are well founded.  

Once a certain hypothesis is made public, all scientists investigating the same natural phenomenon start to design experiments that could demonstrate that the hypothesis is indeed wrong.  It’s only when all possible attempts have been made, and the hypothesis has resisted to all such attempts to prove it wrong, that we can call this a “scientific truth”. What that means is “so far no one could prove it wrong, so we temporarily assume it to be true”.

Achieving a scientific truth is a long and costly process, but it is worth it: once a hypothesis becomes a scientific truth, or as we will call it from now on, scientific knowledge, it can be used to make predictions on how to best solve problems related to the natural phenomenon it refers to. At the risk of oversimplifying, physics aims to produce new scientific knowledge, which engineering uses to solve the problems of humanity. 

For the purpose of this note, is important to stress that the mathematical form chosen to express the hypothesis cannot deny all pre-existing scientific knowledge accumulated so far.  For example, we are quite sure that matter/energy cannot be created or destroyed, but only transformed; this is called law of conservation in physics. Hence, any mathematical form we use to express a scientific hypothesis must not deny the law of conservation.


But the need to solve humanity problems cannot wait for all the necessary scientific knowledge to be available, considering it may take even some centuries for scientists to produce it.  Thus, scientists have developed methods that can be used to solve practical problems even if no knowledge is available, as long as there is plenty of quantitative data obtained from observing the phenomenon of interest.  When the necessary scientific knowledge is available, we solve problems by developing predictive models that are based on such knowledge; otherwise, we use models that are developed only using observational data.  We call the first type knowledge-driven models, and the second type data-driven models.  The first type of models includes those built from the scientific knowledge provided by physics, chemistry, and physiology, for example.  Data-driven models include statistical models and the so-called Artificial Intelligence (AI) models (e.g. machine-learning models).


Now, if the problem at hand is critical (for example in case a wrong solution may threaten the life of people) before we use a model to solve such problem we need to be fairly sure that its predictions are credible, which means sufficiently close to what does happen in reality.  Thus, for critical problems, assessing the credibility of a model is vital. Most problems related to human health are critical, so it should not be a surprise that assessing the credibility of predictive models is a very serious matter in this domain. Unfortunately, assessing the credibility of a data-driven model turns out to be very different from assessing the credibility of a knowledge-driven model.  While the precise explanation of why these are different is quite convoluted and require a solid grasp of mathematics, here we provide a layperson explanation, aimed to all healthcare stakeholders who by training do not have such mathematical background, but still need to make decisions on the credibility of models.


In order to quantify the error made by a predictive model, we need to observe the phenomenon of interest in a particular condition, measure the quantities of interest, then reproduce the same conditions with the model, and compare the quantities it predicts to those measured experimentally.  Of course, this can be done only in a finite number of conditions; but how can be sure that our model will continue to show the same level of predictive accuracy when we use it to predict the phenomenon in a condition different from those we tested?  Here is where the difference on how the model was built plays an important role. 

For knowledge-driven models it can be demonstrated that those mathematical forms chosen to express that knowledge, forms that must be compatible with all pre-existing scientific knowledge, ensure that if the model makes a prediction for a condition close to one tested, also its prediction error will be close to the one quantified in the test. This allows us to assume that once we quantified the prediction error for a sufficiently large number of conditions within a range, for any other condition within that range of conditions the prediction error will remain comparable.  The benefit of this is that for knowledge-driven models we can conduct a properly designed validation campaign, at the end of which we can state with sufficient confidence the credibility of such model.

However, this is not true for data-driven models.  In theory, a data-driven model could be very accurate for one condition, and totally wrong for another close to the first one.  So, the concept of credibility cannot be stated once and for all.  Assessing the credibility of a data-driven model is a continuous process; while we use the model, we periodically need to confirm that the predictive accuracy remains within the acceptable limits, by comparing the model’s predictions to new experimental observations.


To further complicate the matter, sometime a model is composed of multiple parts, some built using a data-driven approach, others built using a knowledge-driven approach.  In such complex cases the model must be decomposed into sub-models, and each needs to assessed in term of credibility in the way most appropriate for its type.


In conclusion, when no scientific knowledge is available for the phenomenon of interest, only data-driven models can be used. In that case, credibility assessment is a continuous process, like quality assessment. After the model is in use, we periodically need to reassess its predictive accuracy against new observational data.  Instead, when scientific knowledge is available, knowledge-driven models are preferable because their credibility can be confirmed with a finite number of validation experiments.