Отчетная конференция научного трека инновационного практикума ФПМИ 2025

Name: Отчетная конференция научного трека инновационного практикума ФПМИ 2025
Start: 2025-05-17T12:30:00+03:00
End: 2025-05-20T20:00:00+03:00
Location: МФТИ

17–20 May 2025

МФТИ

Europe/Moscow timezone

Кирилл Иванов

ivanov.km@mipt.ru

Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens

20 May 2025, 15:27

12m

107 БК (МФТИ)

107 БК

МФТИ

Машинное обучение и нейросети 20-Машинное обучение и нейросети

Nikolai Rekut (MIPT)

Representing molecular structures effectively in chemistry remains a challenging task, with both string- and graph-based approaches commonly employed. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format -- used by most data sets and many language models -- presents notable limitations as a training data format. In this study, we present a novel approach that decomposes molecules into substructures and computes descriptor-based representations for these fragments, providing more detailed and chemically relevant input for model training. We train a language model on this substructure and descriptor data and propose a bimodal architecture that integrates this language model with graph-based models including RoBERTa, Graph Isomorphism Networks (GIN), Graph Convolutional Networks (GCN) and Graphormer. Our framework shows notable improvements over traditional methods in various tasks such as Quantitative Structure-Activity Relationship (QSAR) prediction.

Nikolai Rekut (MIPT)

pres.pdf

thesis.pdf

Отчетная конференция научного трека инновационного практикума ФПМИ 2025

Кирилл Иванов

Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens

107 БК

МФТИ

Speaker

Description

Primary author

Presentation materials