Efficient Exploration of Chemical Kinetics -- Development and application of tractable Gaussian Process Models
Skrár
rgThesis_v2.pdf (19 MB)
Dagsetning
Höfundar
Journal Title
Journal ISSN
Volume Title
Útgefandi
University of Iceland, School of Engineering and Natural Sciences, Faculty of Physical Sciences
Útdráttur
Stjórnun efnakerfa í rúmi og tíma til að hafa áhrif á samverkandi efnahvörf hefur verið markmið efnafræðinnar allt frá dögum gullgerðarlistarinnar. Í dag er mat á afurðum og hraða efnahvarfa, ásamt mati á stöðugleika efna og efniviða, grundvallarverkefni í efnaiðnaði. Þrátt fyrir stökk í stærðfræðilegri líkanagerð, með nákvæmum lýsingum á rafeindaskipan til að lýsa fjöleinda skammtafræðikerfum, og þrátt fyrir aðgengi að stórauknu reikniafli (exascale), vantar enn skilvirkar aðferðir til að ákvarða hvarfhraða í stórum hermunum. Bein hermun á gangverki atóma takmarkast af stuttum tímaskala og litlum lengdarkvarða. Nýlega hefur orðið hröð framþróun í gerð vélrænna mættisfalla (machine learned potential functions), en þær krefjast stórra gagnagrunna sem inntaks og eru ekki hagnýtar þegar verkefnið er að skima hratt í gegnum þúsundir efna eða efniviða til að finna bestu kandídatana fyrir tæknilega nýtingu. Þær hafa ennfremur hingað til takmarkast við svæði þar sem atómin eru í stöðugri uppröðun og eru ekki áreiðanlegar fyrir hvarfástand (transition state regions) sem ákvarða að miklu leiti hvarfhraðann. Tilraunir til að kanna hvarfanet á sjálfvirkan hátt með nægilegri nákvæmni fela í sér of háan kostnað við reikninga í rafeindaskipan. Einfaldandi nálganir fyrir hraðaútreikninga gera ráð fyrir því að efnahvörf séu hægir ferlar miðað við titring atómanna svo að varmalegt jafnvægi náist og nýta því tölfræðilegar nálganir fyrir útreikninga á hvarfhraða. Í einföldustu nálguninni, kjörsveifilsnálgun (harmonic approximation) við virkjunarástandskenninguna (transition state theory), snúast þær um að finna fyrsta stigs söðulpunkta á orkuyfirborðinu sem lýsir því hvernig orka kerfisins er háð staðsetningu atómanna. Jafnvel þá er reikniþörfin við leit að söðulpunktum of mikil í mörgum tilfellum, sérstaklega þegar orka og atómkraftar eru fengnir úr reikningum í rafeindaskipaninni. Hröðun á söðulpunktaleit byggð á staðgengilslíkönum (surrogate models) hefur verið lýst sem vænlegri á nærri áratug, en hefur í reynd verið hömluð af mikilli yfirbyggingu og tölulegum óstöðugleika sem gera að engu ávinninginn í rauntíma.Þessi ritgerð kynnir lausn sem byggir á heildrænni nálgun á þessu verkefni sem samþættir hönnun á eðlisfræðilegri framsetningu, tölfræðilegu líkani og kerfisarkitektúr. Þessi hugmyndafræði birtist í Optimal Transport Gaussian Process (OT-GP) umgjörðinni, sem notar eðlisfræðilega meðvitaða (physics-aware) framsetningu byggða á mælikvörðum fyrir bestun flutnings (optimal transport) til að búa til þjappaðan og efnafræðilega viðeigandi staðgengil fyrir stöðuorkuyfirborðið. Þetta skilgreinir tölfræðilega trausta nálgun og notar markvissa sýnatöku til að draga úr reikniþörfinni. Samhliða endurskrifun á EON hugbúnaðinum fyrir hermun á löngum tímaskala, er sett fram styrktarnámsnálgun (reinforcement-learning) fyrir lágháttarfylgni (minimum mode following) aðferðina þegar lokaástand er ekki tiltekið og hnikateygjubands (nudged elastic band) aðferðina þegar bæði upphafs- og lokaástand eru tilgreind. Samanlagt marka þessar framfarir nýja hugmyndafræði fyrir hermun á efnahvörfum sem byggir á framsetningunni fyrst (representation-first) og er þjónustumiðuð (service-oriented). Árangur þessarar aðferðafræði er sýndur með stórum viðmiðunarprófunum sem sýna góða frammistöðu, greinda með líkönum Bayes. Með því að þróa aðferð fyrir afkastamikil opinn-hugbúnaðar (open-source) verkfæri, umbreytir þessi vinna gömlu fræðilegu loforði í hagnýta tól til að kanna gang og hraða efnahvarfa.
Spatio-temporal control of chemical systems to tune relative rates of competing reactions has been the goal of chemistry since early alchemy. Today, the estimation of the products and rates of chemical reactions as well as the stability of chemicals and materials are fundamental tasks for the chemical industry. Despite leaps in mathematical modeling, with insightful representations of electronic structure to describe many body quantum systems, and inspite of exascale computing resources, efficient methods for determining reaction rates in large scale simulations has remained out of reach. Direct simulation of atomic dynamics is limited by short timescale and small length scale. Recently, there has been rapid advance in the generation of machine learned potential functions, but they require large data sets as input and are not practical when the task is to quickly screen thousands of chemicals or materials to identify optimal candidates for technological applications. They have, furthermore, been limited so far to regions of stable configurations of the atoms and are not reliable for the transition state regions which are needed for estimating reaction rates. Attempts to explore reaction networks in an automated manner at sufficient accuracy suffer from the large computational cost of the electronic structure calculations. Simplifying approximations for rate calculations recognise that reactions represent slow processes on the time scale of atomic vibrations and thermal equilibration, and make use of statistical approximations for chemical rate calculations. In the simplest approximation, the harmonic approximation to transition state theory, they boil down to finding first order saddle points on the energy surface describing how the system's energy depends on the position of the atoms. Even so, the computational effort in saddle point searches is prohibitively large in many cases especially when the energy and atomic forces are obtained from electronic structure calculations. Surrogate model based acceleration of saddle point searches have been described as promising for almost a decade now, but in practical terms have remained crippled by large computational overhead and numerical instabilities that negate the advantage in wall time.This dissertation presents a solution based on a holistic approach that co-designs the physical representation, statistical model, and systems architecture. This philosophy is embodied in the Optimal Transport Gaussian Process (OT-GP) framework, which uses a physics-aware representation based on optimal transport metrics to create a compact and chemically relevant surrogate of the potential energy surface. This defines a statistically robust approach and uses targeted sampling to reduce the computational effort. Alongside rewrites for the EON software for long timescale simulations, we present a reinforcement-learning approach for the minimum-mode following method when final state is not known and nudged elastic band method when both initial and final state are specified. Collectively, these advances establish a representation-first, service-oriented paradigm for chemical kinetics simulations. The success of this paradigm is demonstrated through large-scale benchmarks where the framework shows state of the art performance characteristics, validated with Bayesian hierarchical models. By delivering a framework for high performance open-source tooling, this work transforms a long-held theoretical promise into a practical engine for exploring chemical kinetics.
Spatio-temporal control of chemical systems to tune relative rates of competing reactions has been the goal of chemistry since early alchemy. Today, the estimation of the products and rates of chemical reactions as well as the stability of chemicals and materials are fundamental tasks for the chemical industry. Despite leaps in mathematical modeling, with insightful representations of electronic structure to describe many body quantum systems, and inspite of exascale computing resources, efficient methods for determining reaction rates in large scale simulations has remained out of reach. Direct simulation of atomic dynamics is limited by short timescale and small length scale. Recently, there has been rapid advance in the generation of machine learned potential functions, but they require large data sets as input and are not practical when the task is to quickly screen thousands of chemicals or materials to identify optimal candidates for technological applications. They have, furthermore, been limited so far to regions of stable configurations of the atoms and are not reliable for the transition state regions which are needed for estimating reaction rates. Attempts to explore reaction networks in an automated manner at sufficient accuracy suffer from the large computational cost of the electronic structure calculations. Simplifying approximations for rate calculations recognise that reactions represent slow processes on the time scale of atomic vibrations and thermal equilibration, and make use of statistical approximations for chemical rate calculations. In the simplest approximation, the harmonic approximation to transition state theory, they boil down to finding first order saddle points on the energy surface describing how the system's energy depends on the position of the atoms. Even so, the computational effort in saddle point searches is prohibitively large in many cases especially when the energy and atomic forces are obtained from electronic structure calculations. Surrogate model based acceleration of saddle point searches have been described as promising for almost a decade now, but in practical terms have remained crippled by large computational overhead and numerical instabilities that negate the advantage in wall time.This dissertation presents a solution based on a holistic approach that co-designs the physical representation, statistical model, and systems architecture. This philosophy is embodied in the Optimal Transport Gaussian Process (OT-GP) framework, which uses a physics-aware representation based on optimal transport metrics to create a compact and chemically relevant surrogate of the potential energy surface. This defines a statistically robust approach and uses targeted sampling to reduce the computational effort. Alongside rewrites for the EON software for long timescale simulations, we present a reinforcement-learning approach for the minimum-mode following method when final state is not known and nudged elastic band method when both initial and final state are specified. Collectively, these advances establish a representation-first, service-oriented paradigm for chemical kinetics simulations. The success of this paradigm is demonstrated through large-scale benchmarks where the framework shows state of the art performance characteristics, validated with Bayesian hierarchical models. By delivering a framework for high performance open-source tooling, this work transforms a long-held theoretical promise into a practical engine for exploring chemical kinetics.
Lýsing
Efnisorð
Chemical Reaction Dynamics, Gaussian Process Regression, Computational Representation, Statistical Machine Learning, High-Performance Simulation, Doktorsritgerðir, Skammtafræði