Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,11https://www.york.ac.uk/crd/ identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.

Synthetic data generation: A tertiary study / Nobani, N., Officioso, G., Pallucchini, F., Sperli', G., Mercorio, F.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 63:6(2026). [10.1016/j.ipm.2026.104715]

Synthetic data generation: A tertiary study

Sperli' G.;Mercorio F.
2026

Abstract

Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,11https://www.york.ac.uk/crd/ identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.
2026
Synthetic data generation: A tertiary study / Nobani, N., Officioso, G., Pallucchini, F., Sperli', G., Mercorio, F.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 63:6(2026). [10.1016/j.ipm.2026.104715]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1052225
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact