Publications
A Review of Molecular Representation in the Age of Machine Learning
WIREs Computational Molecular Science, 2022
Research in chemistry increasingly requires interdisciplinary work prompted by, among other things, advances in computing, machine learning, and artificial intelligence. Everyone working with molecules, whether chemist or not, needs an understanding of the representation of molecules in a machine-readable format, as this is central to computational chemistry. Four classes of representations are introduced: string-, connection table-, feature based-, and computer learned-representations. Three of the most significant representations are SMILES, InChI, and the MDL molfile, of which SMILES was the first to successfully be used in conjunction with a variational autoencoder to yield a continuous representation of molecules. This is noteworthy because a continuous representation allows for efficient navigation of the immensely large chemical space of possible molecules. Since 2018, when the first model of this type was published, considerable effort has been put into developing novel and improved methodologies. Most, if not all, researchers in the community make their work easily accessible on GitHub, though discussion of computation time and domain of applicability is often overlooked. Herein we present questions for consideration in future work which we believe will make chemical variational autoencoders even more accessible.
Multi-task Bayesian Optimisation of Chemical Reactions
NeurIPS Machine Learning for Molecules, 2020
Recent work has shown how Bayesian optimization (BO) is an efficient method for optimizing expensive experiments such as chemical reactions. However, in previous studies, each optimization has been started from scratch with no information about previous or similar chemical optimization studies. Therefore, BO can still require more iterations than many experimental budgets provide. Here, we overcome this challenge using multi-task BO. Through in silico benchmarking studies, we show how past experimental data can be leveraged to improve the quality and speed of reaction optimization.
A Framework for Biogas Exploitation in Italian Waste Water Treatment Plants
Chemical Engineering Transactions, 2019
Effective utilisation of biogas is an important step in increasing usage of renewable energy, due to the great flexibility that solar and wind power in particular lacks. Biogas generated through anaerobic digestion (AD) of sewage sludge addresses environmental concerns together with creating electricity generation potential. There is currently no optimisation-based decision-support framework to determine the best use of biogas from a Waste Water Treatment Plant (WWTP), and provide a market outlook for each of the options. This work proposes a novel multi-period Mixed Integer Linear Program (MILP) model for dispatch and selection of technologies capable of exploiting biogas produced from sludge. The novelty is also highlighted by extrapolating the optimised results to a broader analysis of 855 Italian WWTPs with Population Equivalent (P.E.) > 20,000. The use of real input data provides a unique added value to the work. The modelling framework is applied to several case studies. Results show that 7–23 % savings in operating costs are possible from integrating three systems to exploit biogas, and the trade-offs between capital and operating costs affect the optimal system choice. Furthermore, market driven scenarios are used to analyse how to improve the economic performance.
Recent Presentations
June 2022
Cambridge Chemical Engineering and Biotechnology Conference
In silico prediction of PDB by a mechanistic DFT-aided algorithm
Explaining how a network of linear equations fitted to energy difference calculations performed using Density Functional Theory (DFT) can be used to predict protodeboronation (PDB).
April 2022
Guest Speaker, Amaro & McCammon Labs
UC-San Diego, US (virtual)
A practical guide to machine-readable molecular representation
In this talk I presented an overview of molecular representation, we explored reading & writing SMILES and mdl molfiles, and finally I took a deep dive into the ECFP algorithm.
March 2022
Lapkin Lab Machine Learning Subgroup
Cambridge, UK
Molecular Representation Workshop
In this workshop I presented an overview of molecular representation, presented practice problems for understanding SMILES & deepSMILES.
February 2022
St Edmund's Student Conference
Cambridge, UK
Teaching Chemistry to Computers
How are molecules represented to be understandable to comÂputers? Based on my recently accepted review paper I explained feature engiÂneering for chemistry.
January 2022
Aspect Capital ML Reading Group
London, UK (virtual)
Multi-task Bayesian Optimisation of Chemical Reactions
Outlining the results from my conference paper on the same topic which I presented at Neurips 2020.
May 2021
Lapkin Lab Machine Learning Subgroup
Cambridge, UK (virtual)
ChemDraw Workshop
You may be using ChemDraw in your research, but are you aware of all the shortcuts and hotkeys designed to speed up your work?
April 2021
Joint CDT Conference Data Driven Chemical Synthesis and Catalysis, UK (virtual)
Multi-task Bayesian Optimisation
This presentation was based on my Neurips conference paper.
December 2020
Neurips Machine Learning for Molecules
(virtual)
Multi-task Bayesian Optimisation of Chemical Reactions
Presented a poster based on the accepted manuscript which you can find above.
Course List
PhD Machine Learning for Chemistry
3rd year
Cambridge Rising Stars (public engagement), Career Progression in Academia and Industry, Statistics for Chemical Engineers (CET IIA Statistics)
2nd year
EnterpriseTECH: A hands-on entry-level entrepreneurship programme. Our team developed a commercialisation strategy for a by-product formed when extracting fluids with medical applications from watercress.
1st year
Computational Parametrisation, Green Chemistry, Introduction to Computer Science and Programming Using Python, Introduction to Probabilistic Modelling, Machine Learning in Chemistry 101, Model Development and Model Based DoE, Philosophy for Chemists, Responsible Research and Innovation, Science Communication (in Science, Media, and Business), Statistics for Chemists, The Drug Discovery Process.
Course List
MEng Chemical Engineering
4th year
Research project:Â Optimisation of Biogas Exploitation in Waste Water Treatment Plants using GAMS.Â
Design project: Technical design for continuous Lapatinib production plant in South Africa.
Advanced Bioprocess Engineering, Business Economics, Corporate Finance, Dynamic Behaviour of Process Systems, Modelling of Biological Systems, Transport Processes in Biological Systems.
3rd year
Research project:Â Predicting Protein Aggregation using Self-Interaction Chromatography
Coursework projects: Flowsheeting, Mechanical Design, Techno-economic Evaluation
Accounting, Biochemical Engineering, Environmental Engineering, Numerical Methods, Project Management, Reaction Engineering II, Safety and Loss Prevention, Strategy of Design, Transfer Processes III.
2nd year
Research project: Optimising performance of a propane-fed power plant
Coursework projects: Carbon Capture Pilot Plant, Reactor Design and Control
Biochemistry, Business for Engineers II, Development of Business and Economic Ideas 1800-2010, Industrial Chemistry, Mathematics II, Process Dynamics and Control, Reaction Engineering I, Separation Processes II, Thermodynamics II, Transfer Processes II.
1st year
Coursework projects: Foundational lab work
Business for Engineers I, Chemistry I, Introduction to Management, Introduction to MATLAB, Mathematics I, Process Analysis, Professional Skills for Employability, Properties of Matter, Separation Processes I, Transfer Processes I.