Carnegie Mellon University

A deconstructed hard drive lies at the center of a meticulously organized flat-lay composition on a white background, surrounded by spiraling magnetic tape, translucent data sheets with printed database rows, punch cards, memory chips, and electronic components arranged in precise grids. Several data sheets are circled in red and connected by thin red threads, suggesting data leakage. The image is monochromatic with selective red highlights and features sharp shadows and clean lines, evoking a high-contrast, knolling-style aesthetic.

May 21, 2025

CMU's "Tartan Federer" Team Sweeps All Four Tracks at International AI Privacy Challenge

By Josh Quicksall

Aaron Aupperlee
  • Senior Director of Media Relations, SCS

Carnegie Mellon University's "Tartan Federer" team, led by S3D Assistant Professor Zhiwei Steven Wu, achieved a remarkable feat by winning all four tracks of the Vector Institute MIDST Challenge at SaTML 2025. The team's innovative approach to membership inference attacks revealed significant privacy vulnerabilities in diffusion models used for generating synthetic tabular data, with implications for industries relying on AI-generated information while protecting sensitive data.

Clean Sweep at International Competition

The Tartan Federer team, comprised of members from Carnegie Mellon University's Software and Societal Systems Department (S3D), dominated the MIDST Challenge (Membership Inference over Diffusion-models-based Synthetic Tabular data) at the 3rd IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The competition took place over two months prior to the SaTML conference, held April 9–11, 2025, at the University of Copenhagen in Denmark, and featured 71 competing teams from around the world.

Led by S3D faculty member Zhiwei Steven Wu, the team included Xiaoyu (Nicholas) Wu, a visiting master's student from Shanghai Jiao Tong University; Yifei Pang, a research intern at CMU pursuing an MS in Information Security; and Terrance Liu, a PhD student in the Machine Learning Department at CMU.

The team secured first place across all four competition tracks. This complete sweep demonstrates the versatility and effectiveness of their approach across different scenarios and constraints.

The MIDST Challenge specifically focused on testing the privacy vulnerabilities of diffusion models – a class of AI systems that have recently gained popularity for generating synthetic data. Diffusion models work by gradually adding noise to data and then learning to reverse this process, allowing them to create realistic-looking artificial data points.

The Vector Institute, a leading AI research organization, designed the challenge to evaluate how vulnerable these systems are to membership inference attacks when used to generate synthetic tabular data. Tabular data – information organized in rows and columns like spreadsheets or databases – is particularly important in fields like healthcare and finance, where privacy concerns are paramount.

The competition was structured around four distinct tracks, testing different aspects of privacy vulnerabilities:

  • White-box tracks gave competitors full access to model parameters and architecture
  • Black-box tracks limited access to only model outputs
  • Single-table tracks focused on standard tabular data (like a single database table)
  • Multi-table tracks addressed more complex relational data across multiple tables

Participants were evaluated based on their true positive rate at a 10% false positive rate, measuring how effectively they could identify training data while maintaining a low false positive rate.

Technical Innovation

The Tartan Federer team's success stemmed from their novel approach to membership inference attacks tailored specifically for diffusion models applied to tabular data. Their research revealed that existing methods designed for image-based diffusion models performed poorly in the tabular domain, necessitating a fresh approach.

“Membership inference turned out to be much trickier on tabular diffusion models than on vision models,” said Wu. “But the students on the Tartan Federer team came up with a clever approach—they used loss features across different noise levels and time steps, ran them through a lightweight neural net, and were able to tease out strong membership signals without a lot of manual tuning.”

This innovative approach proved effective across both black-box and white-box settings, as well as for both single-table and multi-table scenarios. The versatility of their method allowed them to outperform specialized approaches in each track, including runners-up like Yan Pang (White-box Single Table), CITADEL & UQAM (Black-box Single Table), and Cyber@BGU (Black-box Multi-table).

Particularly notable was their performance in the White-box Multi-table track, where they were the only team to exceed random guessing with a 35% true positive rate at a 10% false positive rate.

S3D Connection and Broader Impact

This achievement exemplifies S3D's mission of understanding and improving how computational technologies can better serve societies and communities. The department's interdisciplinary approach, combining rigorous computer science with insights from social sciences and policy studies, provided the ideal environment for this research at the intersection of privacy, security, and responsible AI development.

Wu's leadership in this area aligns perfectly with S3D's research focus on societal computing, particularly privacy and security concerns in modern AI systems. The team's work demonstrates how technical innovation can address real-world challenges in responsible technology development.

The findings have significant implications for organizations working with sensitive data. As diffusion models become increasingly popular for generating synthetic data, understanding their privacy vulnerabilities becomes crucial. The team's work provides valuable insights for developing more robust privacy-preserving techniques in synthetic data generation.

"As organizations increasingly turn to AI-generated data to protect privacy, we need to ensure these approaches actually deliver the privacy benefits they promise."

Publication and Future Directions

The team published their findings in a paper titled "Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis" (arXiv: 2503.12008) on March 15, 2025, which was accepted at the Theory and Practice of Differential Privacy 2025 (TPDP 2025). They also made their code publicly available on GitHub. This commitment to open research exemplifies S3D's approach to advancing knowledge in responsible computing.

Looking ahead, this work opens several promising research directions, including developing more robust defenses against membership inference attacks and creating diffusion models with improved privacy guarantees. The team's insights will likely influence how privacy-preserving synthetic data is generated and evaluated across industries handling sensitive information.

For S3D, this achievement further establishes the department's leadership in addressing critical challenges at the intersection of technology and society. By developing methods to identify and mitigate privacy risks in advanced AI systems, the department continues its mission of creating computational technologies that responsibly serve communities and society at large.


For more information about the MIDST Challenge, visit the Vector Institute's website at vectorinstitute.github.io/MIDST. To learn more about Wu's research, visit zstevenwu.com.