Graphcore helps PNNL accelerate 3D molecular modelling with GNNS

Added: 25th October 2023 by Graphcore

Training computer models for applications in computational chemistry can now be significantly accelerated thanks to the partnership between Graphcore and Pacific Northwest National Laboratory (PNNL).

The results of this partnership show that the training time for molecular graph neural networks (GNNs) can be drastically reduced by pretraining and finetuning the model when Graphcore intelligence processing units (IPUs) are used as an artificial intelligence (AI) accelerator.

These results were published online in Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators1 and recently presented at the NeurIPS Workshop on Machine Learning and the Physical Sciences. 

Facilitating Discoveries Through Collaboration

PNNL is a leading center for scientific discovery in chemistry, data analytics, Earth and life sciences, and for technological innovation in sustainable energy and national security.

By synergizing PNNL’s historic strength in chemistry with Graphcore’s advanced AI systems, the collaboration between the two enables the accelerated training of molecular GNNs.

The work thus far has focused on the SchNet GNN architecture2 applied to a large dataset—the HydroNet3 dataset—comprised of five million geometric structures of water clusters.

This GNN is designed for exploring the geometric structure – function relationship of molecules at a fraction of the computational cost of traditional computational chemistry methods.

Using the HydroNet dataset for this study was a natural fit—not only is it the largest collection of data for water cluster energetics reported to date, but its development was also spearheaded by PNNL researchers.

Accelerating Computational Chemistry with IPUs

The SchNet model works by learning a mapping from a molecular structure to a quantum chemistry property. PNNL and Graphcore researchers are training the GNN to predict the binding energy of water clusters.

This energy is simply understood as a quantitative measure of how strongly the entire network of atoms is held together by chemical bonds.   

The HydroNet dataset includes the binding energies which have been computed using highly accurate quantum-chemistry methods4.

The chemical bonds within a water cluster include a mixture of both strong covalent bonds with nearby atoms as well as long-range interactions through hydrogen bonding.

These interactions are known to be difficult to model in a performant manner. SchNet GNN neatly handles these by using a spatial cutoff to define pairwise interactions that grow linearly with the number of atoms being modelled and leveraging multiple message-passing steps to effectively propagate pairwise interactions across the entire structure to better incorporate the many-body nature of the bonding network. 

Initial investigations into using SchNet for the HydroNet dataset have been limited to only using 10% of the total dataset5.

Even limiting the size of the dataset resulted in a time-to-train of 2.7 days with a reported time-per-epoch of 4.5 minutes using four NVIDIA V100 GPUs.

The graph below shows the speedup for the time-per-epoch by using Graphcore accelerated training of SchNet over the same small subset (10%) of the HydroNet dataset. 

Follow your favourite employers
Save jobs to your shortlist
Receive personalised alerts
Access our live webinars
Register now
On the move? Download The App
Gradcracker Logo
Gradcracker Limited, October House, Long Street, Easingwold, York, YO61 3HX
01347 823822 | info@gradcracker.com | Company registration number: 6370348
© 2007 - 2024 Gradcracker Limited
Gradcracker and Cookies
We use cookies to ensure that we give you the best experience on our website.
If you continue we'll assume that you are happy to receive all cookies from Gradcracker.
Continue
Learn more