Research: Generative Discovery Beyond the Data

My research centers on exploration for out-of-distribution discovery: from the theoretical foundations of exploration and discovery, to principled flow- and diffusion-based discovery methods, and applications in biochemistry. More broadly, I aim to understand and establish the principles of discovery systems at the intersection of generative modeling, optimization, and sequential decision-making, toward a science of generative discovery.

Research Overview

Mathematical Foundations of Exploration and Discovery

Mathematical Foundations animation

I develop mathematical foundations for exploration and discovery, from discrete dynamical systems to flow- and diffusion-based generative models. My work showed that maximum-entropy exploration can require non-Markovian policies [M1], developed frameworks for optimizing complex exploration and experimental-design objectives [M2, M3], and studied how geometric priors can improve the statistical complexity of active exploration [M4]. To understand generative discovery processes, I recently extended these principles to answer questions such as:

  • How can exploration be formalized on spaces implicitly represented by flow models? [M5, M6]
  • When can local flow expansion yield global coverage of new valid design regions? [M7]
  • What guarantees are possible for distributional flow adaptation beyond average reward? [M8, M9]

Selected Papers

  1. ICML
    Provable Maximum Entropy Manifold Exploration via Diffusion Models
    Riccardo De Santi*, Marin Vlastelica*, Ya-Ping Hsieh, and 3 more authors
    International Conference on Machine Learning (ICML), 2025
  2. ICMLOutstanding Paper
    The Importance of Non-Markovianity in Maximum State Entropy Exploration
    Mirco Mutti*, Riccardo De Santi*, and Marcello Restelli
    International Conference on Machine Learning (ICML), 2022
    Outstanding Paper Award at ICML 2022

See all publications →

Discovery Algorithms via Flow and Diffusion Models

Generative Discovery Algorithms animation

I develop scalable algorithms that turn flow and diffusion models into practical engines for out-of-distribution discovery. My methods go beyond standard reward-guided fine-tuning: they adapt pre-trained generative models to amplify low-probability modes hidden within the prior — effectively debiasing it from its pre-training data [A1, A2]; to expand into new valid regions through verifier-constrained entropy expansion, yielding higher novelty and diversity [A3]; and to target rare, high-value outcomes via distributional fine-tuning [A4, A5], or access intermediate states via reward-guided merging [A6]. Concretely, my algorithms contributed to answering questions such as:

  • How can fine-tuning allow to access low-probability modes hidden in a pre-trained flow? [A1, A2]
  • How can verifier feedback drive flow expansion into new valid design regions? [A3]
  • How can flow fine-tuning target rare outcomes in the tails of the reward distribution? [A4, A5]

Selected Papers

  1. ICMLOral Presentation
    A Unified Density Operator View of Flow Control and Merging
    Riccardo De Santi, Malte Franke, Ya-Ping Hsieh, and 1 more author
    International Conference on Machine Learning (ICML), 2026
    Oral at Real-World Constrained and Preference-Aligned Flow and Diffusion-Based Models Workshop at ICLR 2026
  2. ICLR
    Verifier-Constrained Flow Expansion for Discovery Beyond the Data
    Riccardo De Santi*, Kimon Protopapas*, Ya-Ping Hsieh, and 1 more author
    International Conference on Learning Representations (ICLR), 2026
  3. NeurIPSSpotlight and Oral Presentation
    Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
    Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh, and 3 more authors
    Advances in Neural Information Processing Systems (NeurIPS), 2025
    Spotlight at NeurIPS 2025 and Oral at Workshop on Generative AI and Biology at ICML 2025

See all publications →

Biochemistry Applications

Real-World Science Applications animation

I bring discovery algorithms to the design of drug-like molecules, therapeutic peptides, and proteins, partnering with chemistry and biology academic labs [e.g., LAB1, LAB2] and industry [e.g., LAB3] to close the loop between generative exploration and out-of-distribution discovery on real-world data. This line focuses on translating principled methods into measurable impact for sustainable chemistry and biotechnology. Concretely, I work on questions such as:

  • How should we measure diversity, novelty, and coverage across biochemical design spaces? [B1]
  • How can we generate novel molecules that satisfy both functional and synthetic constraints? [B2]
  • How can we sacrifice the average molecular quality to improve the top candidates? [B3, B4]

Selected Papers

  1. ICMLOral Presentation
    Constrained Molecular Generation via Sequential Flow Model Fine-Tuning
    Sven Gutjahr*, Riccardo De Santi*, Luca Schaufelberger*, and 2 more authors
    International Conference on Machine Learning (ICML), 2026
    Oral at Frontiers in Probabilistic Inference Workshop at NeurIPS 2025
  2. ICML
    Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning
    Zifan Wang, Riccardo De Santi, Xiaoyu Mo, and 3 more authors
    International Conference on Machine Learning (ICML), 2026
  3. ICLR
    Verifier-Constrained Flow Expansion for Discovery Beyond the Data
    Riccardo De Santi*, Kimon Protopapas*, Ya-Ping Hsieh, and 1 more author
    International Conference on Learning Representations (ICLR), 2026

See all publications →