The main goal of machine learning models in healthcare is to accurately estimate patient risk. For a model to be practical, it should not only have good discriminative performance but also be well-calibrated so that the predicted probabilities are meaningful and interpretable. So model development starts by defining the right probability to be estimated.
When making predictions for patients, we often encounter outcomes that depend on other conditions. For example, we may want to predict the quality of life two years after survivorship, but this measurement is conditional on the patient surviving for those two years. Traditionally, patients who did not survive are excluded from the model development, which can introduce bias.
In the BD4QoL project, we developed a methodology based on conformal predictions that includes all patients in the model development process, making the algorithm less biased and more fair. This approach allows us to predict not only the probability of having a poor quality of life in the future but also the probability of surviving, and the joint probability of experiencing any adverse event for all patients. This new method can be valuable for informing patients, clinicians, and hospital management. By knowing which patients will need help to improve their quality of life or require other specialised care, resources can be allocated more efficiently.
This result was only made possible by the joint effort of partners who shared their valuable data across borders, including Istituto Nazionale dei Tumori, University of Mainz, and University of Bristol, for researchers at the University of Oslo and University of Deusto. The peer-reviewed publication will soon be available to the whole community.