Research: Generative Discovery Beyond the Data
My research centers on exploration for out-of-distribution discovery: from the theoretical foundations of exploration and discovery, to principled flow- and diffusion-based discovery methods, and applications in biochemistry. More broadly, I aim to understand and establish the principles of discovery systems at the intersection of generative modeling, optimization, and sequential decision-making, toward a science of generative discovery.
Mathematical Foundations of Exploration and Discovery
I develop mathematical foundations for exploration and discovery, from discrete dynamical systems to flow- and diffusion-based generative models. My work showed that maximum-entropy exploration can require non-Markovian policies [M1], developed frameworks for optimizing complex exploration and experimental-design objectives [M2, M3], and studied how geometric priors can improve the statistical complexity of active exploration [M4]. To understand generative discovery processes, I recently extended these principles to answer questions such as:
Selected Papers
Discovery Algorithms via Flow and Diffusion Models
I develop scalable algorithms that turn flow and diffusion models into practical engines for out-of-distribution discovery. My methods go beyond standard reward-guided fine-tuning: they adapt pre-trained generative models to amplify low-probability modes hidden within the prior — effectively debiasing it from its pre-training data [A1, A2]; to expand into new valid regions through verifier-constrained entropy expansion, yielding higher novelty and diversity [A3]; and to target rare, high-value outcomes via distributional fine-tuning [A4, A5], or access intermediate states via reward-guided merging [A6]. Concretely, my algorithms contributed to answering questions such as:
Selected Papers
Biochemistry Applications
I bring discovery algorithms to the design of drug-like molecules, therapeutic peptides, and proteins, partnering with chemistry and biology academic labs [e.g., LAB1, LAB2] and industry [e.g., LAB3] to close the loop between generative exploration and out-of-distribution discovery on real-world data. This line focuses on translating principled methods into measurable impact for sustainable chemistry and biotechnology. Concretely, I work on questions such as:
Selected Papers
- ICMLOral PresentationConstrained Molecular Generation via Sequential Flow Model Fine-TuningInternational Conference on Machine Learning (ICML), 2026Oral at Frontiers in Probabilistic Inference Workshop at NeurIPS 2025