Thanks to recent advances in deep leaning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes they unfit to face real-world scenarios. In this work we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific attacks. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools and we test several such solutions on three popular test sets, obtaining good performance, high generalization ability and high robustness to audio impairment.

Deepfake audio detection by speaker verification / Pianese, Alessandro; Cozzolino, Davide; Poggi, Giovanni; Verdoliva, Luisa. - (2022). (Intervento presentato al convegno IEEE International Workshop on Information Forensics and Security nel 12 - 16 December) [10.1109/WIFS55849.2022.9975428].

Deepfake audio detection by speaker verification

Alessandro Pianese;Davide Cozzolino;Giovanni Poggi;Luisa Verdoliva
2022

Abstract

Thanks to recent advances in deep leaning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes they unfit to face real-world scenarios. In this work we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific attacks. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools and we test several such solutions on three popular test sets, obtaining good performance, high generalization ability and high robustness to audio impairment.
2022
Deepfake audio detection by speaker verification / Pianese, Alessandro; Cozzolino, Davide; Poggi, Giovanni; Verdoliva, Luisa. - (2022). (Intervento presentato al convegno IEEE International Workshop on Information Forensics and Security nel 12 - 16 December) [10.1109/WIFS55849.2022.9975428].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/926570
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 13
social impact