Mahdi Haghifam

alt text  Email(preferred): haghifam.mahdi@gmail.com
Email: mhaghifam@ttic.edu


About Me

I'm an ML researcher working at the intersection of theory and practice. I design algorithms and study their fundamental limits for the properties that matter when ML systems are deployed: efficiency, generalization, privacy, and reliability. My work spans foundations of ML (optimization and generalization), privacy-preserving ML, memorization in learning systems, and inference-time algorithms for LLM safety and efficiency.

I'm currently a Research Assistant Professor at TTIC. Previously, I was a Distinguished Postdoctoral Researcher at Northeastern University, working with Jonathan Ullman and Adam Smith. I completed my Ph.D. in Machine Learning (thesis) at the University of Toronto / Vector Institute, advised by Daniel M. Roy. I also hold B.Sc. and M.Sc. degrees in Electrical Engineering from Sharif University of Technology.

I've worked in industry at Google DeepMind (with Thomas Steinke) and at ServiceNow Research (with Gintare Karolina Dziugaite). See details here.

Selected Papers

  • The Distillation Game in LLMs: Adaptive Attacks & Efficient Defenses
    Y. Allouah*, M. Haghifam*, R. Shokri, S. Koyejo
    Submitted, 2026
    Code

    Defenses against knowledge distillation are typically evaluated against a passive student that trains uniformly on released traces, but a realistic attacker can reweight examples by their learning value. We formalize this as a minimax game between a utility-constrained teacher and an adaptive student, which yields two best-response rules: an adaptive evaluation protocol on the student side and a defense template on the teacher side. From a likelihood-ratio proxy for example value, we derive Product-of-Experts (PoE), a forward-pass-only defense that combines the teacher with a proxy student during generation. On GSM8K and MATH, adaptive evaluation reveals roughly 50% more leakage than passive evaluation against state-of-the-art defenses, and under this stronger evaluation PoE nearly matches the leakage protection of antidistillation sampling at about half the runtime overhead and with higher-quality reasoning traces.
  • Adaptive Generate-and-Verify: Inference-Time Search with Costly Verification
    S. Dughmi, M. Haghifam, Y.H. Kalayci (authors listed in alphabetic order)
    Submitted, 2026
    Code

    A common paradigm for inference-time compute is to generate many candidate responses, rank them with a cheap reward model, and pass the top-ranked ones to an expensive verifier. Since verification is often far more costly than generation and scoring, the policy faces a per-prompt compute allocation problem. We formalize this as active search: a sequential decision problem with a cheap reward-observation cost and an expensive verification cost. Our ADAP algorithm achieves a constant-factor competitive ratio against the distribution-aware optimum under a natural monotonicity assumption on the reward signal, and on HMMT math and LiveCodeBench coding it reaches 100% success rate at roughly 3x to 5x lower mean cost than the best fixed generate-and-verify policy. A matching lower bound via the centered star number separates active search from active learning.
  • The Sample Complexity of Membership Inference and Privacy Auditing
    M. Haghifam, A. Smith, J. Ullman
    Submitted, 2026

    We characterize the sample complexity of membership inference in the fundamental setting of Gaussian mean estimation. The main finding is that standard empirical attacks using O(n), where n is the number of training samples, reference samples can substantially underestimate the true privacy risk, since matching the fully-informed (Bayes-optimal) adversary requires far more reference data than the size of the training set itself, i.e., n^2.
  • Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
    I. Attias, G. K. Dziugaite, M. Haghifam, R. Livni, D. M. Roy (authors listed in alphabetic order)
    International Conference on Machine Learning (ICML), 2024 (Oral, Best Paper Award: Top 10 of 10,000 submissions)
    Talk

    We establish a strict tradeoff between excess loss and the conditional mutual information any algorithm extracts from its training data in stochastic convex optimization. Achieving small error necessarily requires revealing substantial information about individual samples, which yields lower bounds on memorization and a direct connection to vulnerability against membership inference attacks.
  • Private Geometric Median
    M. Haghifam, T. Steinke, J. Ullman
    Advances in Neural Information Processing Systems (NeurIPS), 2024
    Talk Code

    We design a differentially private algorithm for the geometric median whose accuracy adapts to the effective diameter of the data, with no prior bound on scale or dispersion required. The utility guarantee improves on worst-case bounds whenever the data are concentrated, and the analysis avoids restrictive distributional assumptions.
  • Faster Differentially Private Convex Optimization via Second-Order Methods
    A. Ganesh, M. Haghifam, T. Steinke, A. Thakurta (authors listed in alphabetic order)
    Advances in Neural Information Processing Systems (NeurIPS), 2023
    Code

    We show that second-order information can be used effectively under differential privacy, despite the added noise. Our algorithm, a private variant of the regularized cubic Newton method, adapts to the local shape of the loss and achieves the optimal excess loss for DP-ERM while running 10 to 40 times faster in wall-clock time than DP-SGD at comparable accuracy and privacy.
  • Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
    M. Haghifam, J. Negrea, A. Khisti, D. M. Roy, G. K. Dziugaite
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    Talk Code

    We give data-dependent generalization bounds for noisy iterative algorithms such as SGLD using the conditional mutual information framework. Unlike worst-case Lipschitz- or norm-based bounds, the resulting estimates depend on the algorithm's actual trajectory on the training data and remain non-vacuous in modern overparameterized settings.

Industry Internship Experience

Google DeepMind| Research Intern
Mountain View| September 2022 – December 2022
Mentors: Thomas Steinke
- Developed second-order differentially private optimization method achieving 10–40× wall-clock speedups over DP-SGD baselines at comparable accuracy/privacy
- Resulted in publications at NeurIPS 2023 Paper Code and ICML 2023 Paper

ServiceNow Research | Research Intern
Toronto | November 2020 – March 2021
Mentor: Gintare Karolina Dziugaite
- Studied the connections between different generalization approaches in ML
- Resulted in publication in NeurIPS 2021 (Spotlight) Paper

ServiceNow Research | Research Intern
Toronto | February 2019 – May 2019
Mentor: Gintare Karolina Dziugaite
- Proposed data-dependent generalization estimates for noisy SGD / SGLD using gradient disagreement. NeurIPS 2019 Paper
- The proposed method achieves for the first time non-vacuous generalization bound in various modern ML setups.
- Built generalization-prediction tools in TensorFlow for CNNs and MLP on image classification tasks. Code

Contact Me!

Feel free to reach out if you'd like to discuss research ideas. Also, I'm happy to offer guidance and support to those applying to graduate programs, especially individuals who might not typically have access to such assistance