I'm an ML researcher working at the intersection of theory and practice. I design algorithms and study their fundamental limits for the properties that matter when ML systems are deployed: efficiency, generalization, privacy, and reliability. My work spans foundations of ML (optimization and generalization), privacy-preserving ML, memorization in learning systems, and inference-time algorithms for LLM safety and efficiency.
Defenses against knowledge distillation are typically evaluated against a passive student that trains uniformly on released traces, but a realistic attacker can reweight examples by their learning value. We formalize this as a minimax game between a utility-constrained teacher and an adaptive student, which yields two best-response rules: an adaptive evaluation protocol on the student side and a defense template on the teacher side. From a likelihood-ratio proxy for example value, we derive Product-of-Experts (PoE), a forward-pass-only defense that combines the teacher with a proxy student during generation. On GSM8K and MATH, adaptive evaluation reveals roughly 50% more leakage than passive evaluation against state-of-the-art defenses, and under this stronger evaluation PoE nearly matches the leakage protection of antidistillation sampling at about half the runtime overhead and with higher-quality reasoning traces.
A common paradigm for inference-time compute is to generate many candidate responses, rank them with a cheap reward model, and pass the top-ranked ones to an expensive verifier. Since verification is often far more costly than generation and scoring, the policy faces a per-prompt compute allocation problem. We formalize this as active search: a sequential decision problem with a cheap reward-observation cost and an expensive verification cost. Our ADAP algorithm achieves a constant-factor competitive ratio against the distribution-aware optimum under a natural monotonicity assumption on the reward signal, and on HMMT math and LiveCodeBench coding it reaches 100% success rate at roughly 3x to 5x lower mean cost than the best fixed generate-and-verify policy. A matching lower bound via the centered star number separates active search from active learning.
We characterize the sample complexity of membership inference in the fundamental setting of Gaussian mean estimation. The main finding is that standard empirical attacks using O(n), where n is the number of training samples, reference samples can substantially underestimate the true privacy risk, since matching the fully-informed (Bayes-optimal) adversary requires far more reference data than the size of the training set itself, i.e., n^2.
We establish a strict tradeoff between excess loss and the conditional mutual information any algorithm extracts from its training data in stochastic convex optimization. Achieving small error necessarily requires revealing substantial information about individual samples, which yields lower bounds on memorization and a direct connection to vulnerability against membership inference attacks.
We design a differentially private algorithm for the geometric median whose accuracy adapts to the effective diameter of the data, with no prior bound on scale or dispersion required. The utility guarantee improves on worst-case bounds whenever the data are concentrated, and the analysis avoids restrictive distributional assumptions.
We show that second-order information can be used effectively under differential privacy, despite the added noise. Our algorithm, a private variant of the regularized cubic Newton method, adapts to the local shape of the loss and achieves the optimal excess loss for DP-ERM while running 10 to 40 times faster in wall-clock time than DP-SGD at comparable accuracy and privacy.
We give data-dependent generalization bounds for noisy iterative algorithms such as SGLD using the conditional mutual information framework. Unlike worst-case Lipschitz- or norm-based bounds, the resulting estimates depend on the algorithm's actual trajectory on the training data and remain non-vacuous in modern overparameterized settings.
Industry Internship Experience
Google DeepMind| Research Intern Mountain View| September 2022 – December 2022
Mentors: Thomas Steinke
- Developed second-order differentially private optimization method achieving 10–40× wall-clock speedups over DP-SGD baselines at comparable accuracy/privacy
- Resulted in publications at NeurIPS 2023
Paper Code
and ICML 2023
Paper
ServiceNow Research | Research Intern Toronto | November 2020 – March 2021
Mentor: Gintare Karolina Dziugaite
- Studied the connections between different generalization approaches in ML
- Resulted in publication in NeurIPS 2021 (Spotlight)
Paper
ServiceNow Research | Research Intern Toronto | February 2019 – May 2019
Mentor: Gintare Karolina Dziugaite
- Proposed data-dependent generalization estimates for noisy SGD / SGLD using gradient disagreement. NeurIPS 2019
Paper
- The proposed method achieves for the first time non-vacuous generalization bound in various modern ML setups.
- Built generalization-prediction tools in TensorFlow for CNNs and MLP on image classification tasks.
Code
Contact Me!
Feel free to reach out if you'd like to discuss research ideas. Also, I'm happy to offer guidance and support to those applying to graduate programs, especially individuals who might not typically have access to such assistance