dependencies

Dataset Overview

For this project, I used the open Catalyst 2020 Dataset (OC20).

A few important points:

→ data is stored in PyTorch Geometric objects and stored in LMDB files

→ for each task, there are several sized training splits.

→ validation/test splits are broken into subsplits

→ in domain (ID)

→ out of domain adsorbate (OOD-Ads)

→ out of domain catalyst (OOD-Cat)

→ out of domain adsorbate and catalyst (OOD-Both)

Train

Val/test

For tutorial purposes, OC20 offers smaller splits (100 train, 20 val for all tasks) so users can easily store, train, and predict across various tasks

Data visualization:

import matplotlib
matplotlib.use('Agg')

import os
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

params = {
   'axes.labelsize': 14,
   'font.size': 14,
   'font.family': ' DejaVu Sans',
   'legend.fontsize': 20,
   'xtick.labelsize': 20,
   'ytick.labelsize': 20,
   'axes.labelsize': 25,
   'axes.titlesize': 25,
   'text.usetex': False,
   'figure.figsize': [12, 12]
}
matplotlib.rcParams.update(params)

import ase.io
from ase.io.trajectory import Trajectory
from ase.io import extxyz
from ase.calculators.emt import EMT
from ase.build import fcc100, add_adsorbate, molecule
from ase.constraints import FixAtoms
from ase.optimize import LBFGS
from ase.visualize.plot import plot_atoms
from ase import Atoms
from IPython.display import Image

matplotlib.use('Agg') - "Agg" backend, which stands for "Anti-Grain Geometry". This backend is used for saving plots to files, rather than displaying them on the screen.

params dictionary sets some default options for matplotlib, such as the font size and family, the size of the labels and ticks on the axes, and the size of the figure.

matplotlib.rcParams.update(params) line updates default options with the ones specified in the params dictionary

rest of the code imports various functions and classes from the ase and IPython modules, which are used for tasks such as reading and writing atomic simulation data, building and optimizing atomic structures, and displaying images in the notebook.

Understanding the data