publications | Ping Yang

2025

Machine Learning-Based Predictions of Henry Coefficients for Long-Chain Alkanes in One-Dimensional Zeolites: Application to Hydroisomerization

Shrinjay Sharma, Ping Yang, Yachan Liu, Kevin Rossi, Peng Bai , and 7 more authors

J. Phys. Chem. C, 2025

Abs HTML

Shape-selective adsorption in zeolites plays a pivotal role in catalytic hydroisomerization of long-chain alkanes, a key process in producing sustainable aviation fuels from Fischer− Tropsch products. Accurately predicting adsorption behavior for the large number of alkane isomers in different zeolite frameworks is computationally intensive. To address this, we have developed a machine learning framework that rapidly and accurately predicts Henry coefficients of linear (C1−C30) and branched (C4−C20) alkanes in one-dimensional zeolites. Using descriptors based on chain length, branching patterns, and molecular graphs, we evaluate multiple ML models, including Random Forest, XGBoost, CatBoost, TabPFN, and D-MPNN in MTT-, MTW-, MRE-, and AFI-type zeolites. TabPFN and D-MPNN offer the highest predictive accuracy. Active learning further boosts model performance by efficiently selecting diverse and structurally informative isomers. We also uncover activity cliffs, where small changes in molecular structure lead to sharp variations in adsorption, and demonstrate that targeted oversampling of these cases improves model robustness. Finally, we combine the ML-predicted Henry coefficients with gas-phase thermodynamics to compute reaction equilibrium distributions for C16 hydroisomerization. This integrated, data-driven approach enables efficient screening and design of shape-selective zeolite catalysts, thereby reducing the need for costly simulations.

2022

Classifying the Toxicity of Pesticides to Honey Bees via Support Vector Machines with Random Walk Graph Kernels

Ping Yang, E. Adrian Henle, Xiaoli Z. Fern, and Cory M. Simon

J. Chem. Phys., 2022

Abs HTML

Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are valuable as pollinators. Thus, candidate pesticides in development pipelines must be assessed for toxicity to bees. Leveraging a dataset of 382 molecules with toxicity labels from honey bee exposure experiments, we train a support vector machine (SVM) to predict the toxicity of pesticides to honey bees. We compare two representations of the pesticide molecules: (i) a random walk feature vector listing counts of lengthL walks on the molecular graph with each vertex- and edge-label sequence and (ii) the Molecular ACCess System (MACCS) structural key fingerprint (FP), a bit vector indicating the presence/absence of a list of pre-defined subgraph patterns in the molecular graph. We explicitly construct the MACCS FPs but rely on the fixed-length-L random walk graph kernel (RWGK) in place of the dot product for the random walk representation. The L-RWGK-SVM achieves an accuracy, precision, recall, and F1 score (mean over 2000 runs) of 0.81, 0.68, 0.71, and 0.69, respectively, on the test data set—with L = 4 being the mode optimal walk length. The MACCS-FP-SVM performs on par/marginally better than the L-RWGK-SVM, lends more interpretability, but varies more in performance. We interpret the MACCS-FP-SVM by illuminating which subgraph patterns in the molecules tend to strongly push them toward the toxic/non-toxic side of the separating hyperplane.