第7回ケモインフォマティクス秋の学校
2022年11月29日、30日に奈良春日野国際フォーラムにて、第7回ケモインフォマティクス秋の学校を開催しました。
2年に一度開催していましたケモインフォマティクス秋の学校は、コロナ禍の影響で開催が延期されていましたが、今年開催することができました。ケモインフォマティクス分野の国内外の著名な先生方をお招きし、創薬、材料設計、プロセス設計などを中心に各先生の新しい視点での最新の取り組みについてご講演頂きました。また、展示ブースではソフトウェア・ベンダーによる最新ソフトウェア動向の紹介も併せて行いました。創薬・材料設計におけるデータ駆動型化学の最新の動向を提供し、活発な議論を行いました。
開催の概要については以下をご覧ください。
開催日時
2022年11月29日 10:00~16:50
2022年11月30日 9:30~16:30
開催場所
奈良春日野国際フォーラム・能楽ホール(アクセス)
参加費
一般参加者 5,000円 (学生無料) 当日会場受付でお支払い下さい。受付で参加証および領収書を発行いたします。
懇親会
コロナウィルスの感染拡大などの状況を鑑み、懇親会は開催いたしません。ご了承ください。
参加登録
主催
奈良先端科学技術大学院大学 データ駆動型サイエンス創造センター
協賛
日本化学会・ケモインフォマティクス部会
CBI学会
日本コンピューター化学会
出展企業
ダッソー・システムズ株式会社
株式会社アフィニティサイエンス
OpenEye, Cadence Molecular Sciences
株式会社モルシス
株式会社ワールドフュージョン
コンフレックス株式会社
プログラム
11月29日(火)
[Chair: Prof. Alexandre Varnek]
10:00 Opening Remarks: Prof. Kimito Funatsu (NAIST)
10:10~11:10 Prof. Jürgen Bajorath (Bonn University)
“Compound Activity Predictions Through Explainable Machine Learning”
11:10~12:10 Prof. Manabu Sugimoto (Kumamoto University)
“Designable, Explainable, and Interactive Electronic-Structure Informatics”
12:10~13:30 Lunch
[Chair: Prof. Yoshihiro Yamanishi]
13:30~13:45 ダッソー・システムズ株式会社
13:45~14:00 株式会社アフィニティサイエンス
14:00~14:15 OpenEye, Cadence Molecular Sciences
14:15~14:45 Dr. Swarit Jasial (NAIST)
“Understanding Feature Interpretations of Machine Learning/Regression Models”
14:45~15:10 —— Break ——
[Chair: Prof. Jürgen Bajorath]
15:10~16:10 Prof. Yoshihiro Yamanishi (Kyushu University of Technology)
“Data-Driven Drug Discovery and Molecular Design by Machine Learning”
16:10~16:50 Prof. Tomoyuki Miyao (NAIST)
“Global Interpretation of Regression Models for Quantitative Structure-Property Relationship”
11月30日(水)
[Chair: Prof. Manabu Sugimoto]
9:30~10:30 Prof. Alexandre Varnek (Strasbourg University)
“Molecular Cartography Approach to Chemical Space Exploration”
10:30~11:30 Prof. Kenji Hori (TS Technology)
“A New Platform for Functional Chemicals Manufacturing Processes with Data Driven Chemistry”
11:30~11:45 株式会社モルシス
11:45~12:00 株式会社ワールドフュージョン
12:00~12:15 コンフレックス株式会社
12:15~13:45 Lunch
[Chair: Prof. Kenji Hori]
13:45~14:25 Prof. Naoaki Ono (NAIST)
“Classification of Natural Compounds using Molecular Hypergraph Grammar”
14:25~15:25 Prof. Shigehiko Kanaya (NAIST)
“Data Science toward Understanding of QSAR and Biosynthetic Pathways for Secondary Metabolites”
閉会挨拶: Prof. Kimito Funatsu (NAIST)
【実行委員会】 船津 公人(NAIST)、宮尾 知幸(NAIST )、野島 秀雄(NAIST)、福本 路以子(NAIST)
問合せ先: 奈良先端科学技術大学院大学 データ駆動型サイエンス創造センター
e-mail: dsc-info[at] dsc.naist.jp
*[at] は @ に置きかえてください
TEL: 0743-72-6056
各講演者のShort Abstract
Prof. Jürgen Bajorath (Bonn University)
”Compound Activity Predictions Through Explainable Machine Learning”
In the era of deep learning, explainable machine learning (XML) plays an increasingly important role, given that most contemporary ML methods have black box character. In chemoinformatics, various XML methods are adapted or developed for rationalizing molecular property or other ML predictions. For model explanation, quantification of feature importance and visualization play an important role. XML methods are applicable, for example, to explain the results of different compound activity prediction tasks or ‘diagnostic’ ML exercises for hypothesis testing. Features identified to determine predictions can be subjected to follow-up analyses to explore chemical origins of specific activities and support compound design.
Prof. Manabu Sugimoto (Kumamoto University)
“Designable, Explainable, and Interactive Electronic-Structure Informatics”
Chemical information at the electronic level can be considered as one of the fundamental information to develop functional molecules. We have been suggesting “Electronic-structure informatics (ESI)” in which a small set of electronic descriptors for machine learning are defined based on theories in molecular science and solid-state physics. In the present lecture, we will discuss about our recent study on establishing regression models for ESI descriptors using conventional chemoinformatics descriptors. This two-layered machine-learning modeling is expected useful not only for the explanation (interpretation) of machine learning models but also for the structural design of functional molecules. It would also be helpful to have productive collaboration with experimental chemists through local (atom-based) mappings of important descriptors.
“Understanding Feature Interpretations of Machine Learning/Regression Models”
Understanding model decisions is challenging but of critical importance to guide compound design. Several locally interpretable explanatory methods, such as feature weighting and Shapley additive explanations (SHAP), help in rationalizing activity predictions of any machine learning algorithm. Moreover, in the field of Chemoinformatics, feature interpretations can provide additional validation of compounds in terms of chemical intuitiveness. In this lecture, different methods of feature interpretations are reviewed for individual predictions. We focus on two case studies where interpretable machine learning and regression models are generated for activity cliff predictions and antiviral activity predictions, respectively.
Prof. Yoshihiro Yamanishi (Kyushu University of Technology)
“Data-driven drug discovery and molecular design by machine learning”
Biomedical big data are useful resources for drug discovery. In this study, we developed novel machine learning methods to predict therapeutic targets of diseases, to search for drug candidate molecules, and to design new drug chemical structures, by integrating various biomedical data on compounds (e.g., chemical structures, bioactivity) and diseases (e.g., disease-causing genes, gene expression profiles). In the symposium, we will show some applications to therapeutic target identification, combination therapy, and drug design.
“Global Interpretation of Regression Models for Quantitative Structure-Property Relationship”
Global interpretation of regression models can provide a comprehensive understanding of the relationship between molecular descriptors and a target property/activity within a given data set. Multiple linear regression models are self-explainable although correlation among descriptors should be carefully monitored. Feature importance in a black-box model is frequently calculated while it cannot tell how a feature is important in the model. In this lecture, methods for making interpretable regression models are reviewed.
Prof. Alexandre Varnek (Strasbourg University)
”Molecular Cartography Approach to Chemical Space Exploration”
This lecture describes application of the molecular cartography approach to various chemoinformatics tasks: (i) chemical data visualization and analysis, (ii) prediction of properties or biological activities, (iii) comparison of (ultra) large chemical databases, (iv) target identification, and (v) virtual screening. In combination with a SMILES- or graph-based autoencoder, cartography can efficiently be used for automatized generation of chemical structures with desired biological activities and novel chemical transformations.
Prof. Kenji Hori (TS Technology)
“A New Platform for Functional Chemicals Manufacturing Processes with Data Driven Chemistry”
Although materials informatics offers candidates for functional chemicals, it is generally very difficult to synthesize them. We have been developing a new platform connecting computational chemistry with synthesis route development systems such as AIPHOS/TOSP in order to give a solution to the problem. Moreover, we have to confirm in experiments that synthesis target molecules are produced by using the routes developed. It is also important to develop their manufacturing process. In the present study, we will present an overview of the national project for functional chemicals manufacturing processes with data driven chemistry.
“Classification of natural compounds using molecular hypergraph grammar”
Molecular hypergraph grammar (MHG) is a powerful model to represent chemical structures using finite symbols. In this study, we constructed a neural network that encodes chemical structures in a latent space using unsupervised training based on autoencoder, and classified the 19,769 molecules in the KNApSAcK database into 73 groups according to their metabolic biosynthesis pathways.
The results showed the features extracted using MHG can represent the constraints of the metabolic pathways of biological reactions.
Prof. Shigehiko Kanaya (NAIST)
“Data Science toward understanding of QSAR and biosynthetic pathways for secondary metabolites”
There are multifaceted purposes of creating databases including natural product databases. A natural product database is supportive of versatile researches in metabolomics e.g. assignment of peaks of mass spectra (MS) to metabolites. In general, databases can be utilized for knowledge discovery from the systematization or mining of the accumulated data. To attained those purposes, both creation of databases and development of mining techniques are essential in different scientific fields. KNApSAcK Family DB (http://www.knapsackfamily.com/KNApSAcK_Family/) is a set of databases associated with natural products and organisms. Currently, KNApSAcK Core DB consists of 157,144 metabolite-species pairs encompassing 62,646 metabolites and 24,704 species according to the last update in 2022. It is predicted that the set of secondary metabolites synthesized by plants, fungi and microorganisms includes approximately 3000, terpenoids, 9000 flavonoids, 1600 isoflavonoids and 12,000 alkaloids. The number of metabolites included in KNApSAcK Core DB is much more than the predicted number of metabolites.
In the symposium, we briefly discuss major databases under KNApSAcK family DB and application of KNApSaCK DB such as QSAR and classification of metabolites based on alkaloid metabolic pathways based of deep learning such as Graph Convolution Neural Networks (GCNN).