NVISION objectives and outcomes

43007 Tarragona, Spain

Title of Project : Privacy preserving FL pipeline for multi a multi-institutional collaboration
Student Name: Faisal Ahmed

Summary:

Faisal Ahmed holds a bachelor's degree in computer science and engineering from the University of Chittagong and a master's degree in computer science and engineering with a specialization in Data Science from United International University. Faisal's research focuses on responsible artificial intelligence methodologies, particularly privacy-preserving, secure, and explainable machine learning, deep learning, federated learning, and expert systems. He loves to apply his knowledge to practical applications in medical Imaging, digital healthcare, and computational biology. He is a Ph.D. researcher at NVISION pursuing a doctorate in computer engineering and mathematical security at Rovira i Virgili University (URV). He is also an active member of the CRISES research group. His doctoral work is centered around the BosomShield Project, where he endeavors to contribute to the development of a secure, privacy-preserving, and robust federated learning framework. This framework facilitates multi-institutional collaboration, addressing the challenges inherent in sensitive data sharing. As part of his research objectives, Faisal aims to showcase the real-world impact of his proposed framework by piloting it for medical image analysis, specifically for predicting breast cancer relapse. Faisal's commitment to advancing responsible AI methodologies underscores his dedication to creating meaningful solutions in technology and healthcare.

Objectives:
To develop ML/DL models for medical image analysis based on a FL approach by unlocking the main barriers of multiinstitutional collaborations: (a) non-IID local data sampling; (b) differences in image acquisition protocols and labeling methodologies across institutions; and (c) privacy and security concerns and test of the obtained ML/DL models for radiological and pathological image analysis for BC relapse prediction. To reach these objectives, the following tasks will be performed: (1) Review of the learning framework procedures required to set up FL in multi-institutional collaborations (i.e., parallel training, institutional incremental learning (IIL), cyclic institutional incremental learning (CIIL)), and select the most appropriate one for the BosomShield project purpose. (2) In case of non-IID local data, define a mathematically based criterion of the optimal training iteration from which each user would start the improvement of his individual model (personalization phase) on the basis of the shared model trained by all users (collaboration phase) in order to uniformize the individual model performance regardless of whether local data are IID or non-IID. (3) Investigate privacy-preserving federated learning algorithms in order to ensure that medical/personal data cannot be reconstructed by the model manager or an external intruder, design security mechanisms to prevent model poisoning by malicious participants, and make sure the model accuracy does not suffer from privacy and security defenses. (4) Evaluate the performance of the developed FL model on the classification of molecular subtypes of BC developed by DC1 (URV) ---we will use the publicly INbreast (115 cases) and DDSM (1168 cases) datasets to build up the shared model and the multi-institutional radiological images from our partners HUSJR, KTH, and UKCM to update the weights of shared model; the resulting model will be also tested on unseen training data from RADC.(6) Apply the developed federated learning procedure to pathological images using the DL models of the others DCs as shared models to predict the BC relapse.

Outcomes:
The main outcome of this project is to provide a modularized federated learning framework useful in the real world of healthcare that is able to 1) train high performing and generalizable DL models in healthcare, mainly for BC relapse prediction, without private identifiable data exchanging hands; 2) achieve security and privacy in federated learning systems to mitigate model attacks, data reconstruction and disguised local model training; 3) fine-tuned DL models exploiting the FL paradigm to classify molecular subtypes of BC using radiological images and to predict the relapse using radiological and pathological images; and 4) boost the performance of CAD systems for hospitals with smaller datasets.