Innovations in Data Synthesis (a SPRITE+ innovation forum)
24th - 25th July 2025

Principal Investigator: Mark Elliot
Co-Investigators: N/A
Event attendees: 29
Project overview
Synthetic data – from generative models to rule-based simulation – offers a methodology where privacy protection, data utility, and innovation intersect. This forum convened experts across disciplines to consider how emerging methods (like GANs, VAEs, diffusion models, and CART) are being applied in practice. We critically examined current understandings of the trade-offs between utility, fidelity and risk, explored practical deployment strategies, and identify future needs for regulation, tooling, and standards.
Through focused discussion, we aimed to generate a position paper and set the agenda for a new community of practice at the intersection of data synthesis and TIPS (Trust Identity Privacy and Security).
Participants listened to a series of short talks:
Two Worlds United? Synthetic Data in Computer Science and Statistics - Jorg Dreschler
Generating Private Databases with MCMC - Harry McArthur
Synthetic Tabular Data Generation with public/private data - Graham Cormode
Flow matching for tabular data synthesis - Bahrul Nasution
Using saturated count models for data synthesis - Robin Mitra
Training synthetic data generators in federated settings - Rudolf Mayer
Using Synthetic data as an attack vector and for Output risk assessment - Mark Elliot
Generative AI Software for High-Fidelity Household Load Profiles - Gus Chadney
A paradigm for creating synthetic data with utility and privacy assessment - Gillian Raab
Syndiffix - Paul Francis;
Using agent-based modelling to generate synthetic data - Jools Kasmire;
DARE UK Synthetic Data Community Group - Developing Governance, Standards and Tools for Synthetic Data - Lewis Hotchkiss
The Simulacrum: Enabling real-world patient studies with privacy-preserving synthetic data - Lora Frayling
Synthetic data, a data owners perspective - Iain Dove.
There were also three breakout sessions where participants focused on the challenges and then specific topics to produce agenda setting problem statements.
The innovation forum was very collegial, and we made good progress over the two days. An embryonic position paper has been produced, and a writing group has formed to work that up further.
Project outputs
Immediate Outputs: A four-page document which summarises the joint thinking form the workshop attendees or the governance, regulation and research agendas for data synthesis.