The rapid rise of deep generative models has enabled realistic synthetic media, posing severe risks to healthcare communication where trust and accuracy are essential. Existing deepfake detectors, mostly trained on entertainment-oriented datasets, degrade sharply in medical contexts, where domainspecific terminology and evidence-based discourse are critical. Moreover, post-hoc moderation is often too slow to mitigate the spread of health-related disinformation, underscoring the need for timely, domain-adapted solutions. We address these gaps by (i) curating a multimodal healthcare-oriented corpus of in-the wild and synthetically generated deepfakes, (ii) benchmarking state-of-the-art monomodal detectors for video, audio, and text, and (iii) introducing a late-fusion framework that integrates their predictions via ensemble methods. Our experiments show that late fusion consistently outperforms individual detectors, yielding notable gains in macro-F1 and AUC (+0.159 and +0.131, respectively), particularly under domain-shift and lowresource conditions. These findings demonstrate the importance of multimodal integration for robust detection of health-related deepfakes and contribute toward building trustworthy AI systems to safeguard medical communication. Code and data will be released for transparency and reproducibility: https://github.com/PRAISELab-PicusLab/Deepfake-Detection-in-Healthcare
SynthMed: Generating and Detecting Multimodal Deepfakes for Healthcare Communication / Barone, M.; Di Serio, F.; Moscato, V.; Postiglione, M.; Riccio, G.; Romano, A.. - (2025), pp. 108-115. ( 27th International Symposium on Multimedia, ISM 2025 ita 2025) [10.1109/ISM66958.2025.00034].
SynthMed: Generating and Detecting Multimodal Deepfakes for Healthcare Communication
Barone M.;Di Serio F.;Moscato V.;Postiglione M.;
2025
Abstract
The rapid rise of deep generative models has enabled realistic synthetic media, posing severe risks to healthcare communication where trust and accuracy are essential. Existing deepfake detectors, mostly trained on entertainment-oriented datasets, degrade sharply in medical contexts, where domainspecific terminology and evidence-based discourse are critical. Moreover, post-hoc moderation is often too slow to mitigate the spread of health-related disinformation, underscoring the need for timely, domain-adapted solutions. We address these gaps by (i) curating a multimodal healthcare-oriented corpus of in-the wild and synthetically generated deepfakes, (ii) benchmarking state-of-the-art monomodal detectors for video, audio, and text, and (iii) introducing a late-fusion framework that integrates their predictions via ensemble methods. Our experiments show that late fusion consistently outperforms individual detectors, yielding notable gains in macro-F1 and AUC (+0.159 and +0.131, respectively), particularly under domain-shift and lowresource conditions. These findings demonstrate the importance of multimodal integration for robust detection of health-related deepfakes and contribute toward building trustworthy AI systems to safeguard medical communication. Code and data will be released for transparency and reproducibility: https://github.com/PRAISELab-PicusLab/Deepfake-Detection-in-HealthcareI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


