Doorkeeper

Molecular Informatics Team Seminar (Talk by Haoyang (Oscar) Wu, MIT ChemE-CSE).

Thu, 17 Apr 2025 09:00 - 10:00 JST
Online Link visible to participants
Register
Free admission
There is room for 187 more people

Description

This is an online seminar. Registration is required.

【Team】Molecular Informatics Team
【Date】2025/April/17(Thursday) 9:00-10:00(JST)
【Speaker】Talk by Haoyang (Oscar) Wu, MIT ChemE-CSE

Title: Uniting High-throughput Quantum Chemistry and Graph Neural Networks to Enhance Property Prediction for Autonomous Molecular Discovery

Abstract: Exploring, realizing, and validating novel materials with desired functional
properties is crucial for breakthroughs in health, energy, and sustainability. Autonomous
molecular design-make-test-analyze (DMTA) workflows hold promise for accelerating
such discoveries, yet their effectiveness in active-learning material design campaigns
often hinges on the availability of reliable and robust molecular property and chemical
reactivity prediction models. Due to the iterative nature of DMTA active-learning
cycles, even minor enhancements or under-performance in model accuracy per cycle
can culminate over time to critically impact overall success. Traditional graph neural
networks (GNNs), including directed-message passing neural networks (D-MPNNs),
often struggle with model accuracy and generalizability in the early stages of discovery
due to limited experimental data. Our QM-GNN approach addresses this challenge by
strategically integrating quantum mechanical (QM) descriptors derived from extensive
high-throughput computational chemistry calculations with GNNs, infusing property
prediction machine learning models with physicochemical insights to boost their
performance. This talk will showcase the effectiveness of the QM-GNN approach in
accurately predicting regio-selectivity in electrophilic substitution reactions. To further
explore the effectiveness of the QM-GNN approach, I will present our recent systematic
investigation of the impact of atom, bond, and molecular QM descriptors on the
performance of D-MPNNs for predicting 16 diverse molecular properties. The analysis
surveys computational and experimental targets, classification and regression tasks, and
varied dataset sizes from several hundred to hundreds of thousands of datapoints. Our
results indicate that QM descriptors are usually mostly beneficial to D-MPNN
performance on small datasets, provided that the descriptors correlate well with the
targets and can be readily computed at high accuracy. Subsequently, I will discuss a set
of practical guidelines regarding when and how to best leverage QM descriptors. Finally,
this talk will highlight our recent creation of a one-of-a-kind QM dataset of more than
200,000 DFT transition states and DLPNO-CCSD(T)-F12d reaction barrier heights,
along with 100 million COSMO-RS solvation free energies generated from a highthroughput
computational chemistry workflow. The creation of this dataset is inspired
from our previous ab initio kinetics studies to model oxidation of drug molecules, and
we will demonstrate promising preliminary results of D-MPNN models trained on the
dataset that can potentially excel in predicting certain oxidative stability of diverse
materials. Overall, our work illustrates that uniting high-throughput quantum chemistry
and machine learning can not only overcome many challenges posed by experimental
data scarcity, but also can provide additional physicochemical insights and contribute
to a substantial step forward in autonomous molecular discovery.
Keywords: Physics-informed machine learning, High-throughput quantum
chemistry, Graph neural network for molecular property prediction.

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community