Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,11https://www.york.ac.uk/crd/ identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.
Synthetic data generation: A tertiary study / Nobani, N., Officioso, G., Pallucchini, F., Sperli', G., Mercorio, F.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 63:6(2026). [10.1016/j.ipm.2026.104715]
Synthetic data generation: A tertiary study
Sperli' G.;Mercorio F.
2026
Abstract
Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,11https://www.york.ac.uk/crd/ identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


