データ駆動型サイエンス創造センター

seminar

Events

2022.11.08
seminar

7th Autumn School of Chemoinformatics in Nara, 2022

Time & Date

10:00~16:50, November 29, 2022
9:30~16:30, November 30, 2022

Venue

Nara Kasugano International Forum 甍 IRAKA(Access

Fee

Regular:  JPY 5,000  (free for students) 
Please pay at the entrance.

Registration

Please click here for registration

Host

Data Science Center
Nara Institute of Science and Technology (NAIST)

Co-sponsors

Divison of Chemoinformatics, The Chemical Society of Japan
CBI Society
Society of Computer Chemistry, Japan

Exhibitors

DASSAULT SYSTEMES
AFFINITY SCIENCE CORPORATION
OpenEye, Cadence Molecular Sciences
MOLSIS Inc.
World Fusion Co., Ltd.
CONFLEX Corporation

Program
November 29

[Chair: Prof. Alexandre Varnek]

10:00                         Opening Remarks: Prof. Kimito Funatsu (NAIST)

10:10~11:10   Prof. Jürgen Bajorath (Bonn University)
                                   “Compound Activity Predictions Through Explainable Machine Learning”

11:10~12:10       Prof. Manabu Sugimoto (Kumamoto University)
                                   “Designable, Explainable, and Interactive Electronic-Structure Informatics” 

12:10~13:30          Lunch 

[Chair: Prof. Yoshihiro Yamanishi]

13:30~13:45          DASSAULT SYSTEMES

13:45~14:00         AFFINITY SCIENCE CORPORATION

14:00~14:15          OpenEye, Cadence Molecular Sciences

14:20~14:50          Dr. Swarit Jasial (NAIST)
                                    “Understanding Feature Interpretations of Machine Learning/Regression Models” 

14:50~15:10          —— Break ——

[Chair: Prof. Jürgen Bajorath]

15:10~16:10           Prof. Yoshihiro Yamanishi (Kyushu University of Technology)
                                   “Data-driven drug discovery and molecular design by machine learning” 

16:10~16:50           Prof. Tomoyuki Miyao (NAIST)
                                     “Global Interpretation of Regression Models for Quantitative Structure-Property Relationship”

November 30

[Chair: Prof. Manabu Sugimoto]

9:30~10:30              Prof. Alexandre Varnek (Strasbourg University)
                                     “Molecular Cartography Approach to Chemical Space Exploration”

10:30~11:30           Prof. Kenji Hori (TS Technology)
                                    “A New Platform for Functional Chemicals Manufacturing Processes with Data Driven Chemistry” 

11:30~11:45           MOLSIS Inc.

11:45~12:00           World Fusion Co., Ltd.

12:00~12:15           CONFLEX Corporation

12:15~13:45       Lunch 

[Chair: Prof. Kenji Hori]

14:45~15:25        Prof. Naoaki Ono (NAIST)
                                     “Classification of natural compounds using molecular hypergraph grammar” 

15:25~16:25        Prof. Shigehiko Kanaya (NAIST)  
                                     “Data Science toward understanding of QSAR and biosynthetic pathways for secondary metabolites” 

Closing Remarks:                   Prof. Kimito Funatsu (NAIST)

 

【Organizing Committee】 Kimito Funatsu(NAIST), Tomoyuki Miyao(NAIST ), Hideo Nojima(NAIST), Ruiko Fukumoto(NAIST)

Contact:   Data Science Center, Nara Institute of Science and Technology (NAIST)
e-mail:            dsc-info[at] dsc.naist.jp
                         *Please replace [at] with @
TEL:                0743-72-6056

 

Short Abstract

Prof. Jürgen Bajorath (Bonn University)

”Compound Activity Predictions Through Explainable Machine Learning”

In the era of deep learning, explainable machine learning (XML) plays an increasingly important role, given that most contemporary ML methods have black box character. In chemoinformatics, various XML methods are adapted or developed for rationalizing molecular property or other ML predictions. For model explanation, quantification of feature importance and visualization play an important role. XML methods are applicable, for example, to explain the results of different compound activity prediction tasks or ‘diagnostic’ ML exercises for hypothesis testing. Features identified to determine predictions can be subjected to follow-up analyses to explore chemical origins of specific activities and support compound design.

Prof. Manabu Sugimoto (Kumamoto University)

“Designable, Explainable, and Interactive Electronic-Structure Informatics”

Chemical information at the electronic level can be considered as one of the fundamental information to develop functional molecules. We have been suggesting “Electronic-structure informatics (ESI)” in which a small set of electronic descriptors for machine learning are defined based on theories in molecular science and solid-state physics. In the present lecture, we will discuss about our recent study on establishing regression models for ESI descriptors using conventional chemoinformatics descriptors. This two-layered machine-learning modeling is expected useful not only for the explanation (interpretation) of machine learning models but also for the structural design of functional molecules. It would also be helpful to have productive collaboration with experimental chemists through local (atom-based) mappings of important descriptors. 

Dr. Swarit Jasial (NAIST)

“Understanding Feature Interpretations of Machine Learning/Regression Models”

Understanding model decisions is challenging but of critical importance to guide compound design. Several locally interpretable explanatory methods, such as feature weighting and Shapley additive explanations (SHAP), help in rationalizing activity predictions of any machine learning algorithm. Moreover, in the field of Chemoinformatics, feature interpretations can provide additional validation of compounds in terms of chemical intuitiveness. In this lecture, different methods of feature interpretations are reviewed for individual predictions. We focus on two case studies where interpretable machine learning and regression models are generated for activity cliff predictions and antiviral activity predictions, respectively.

Prof. Yoshihiro Yamanishi (Kyushu University of Technology)

Data-driven drug discovery and molecular design by machine learning”

Biomedical big data are useful resources for drug discovery. In this study, we developed novel machine learning methods to predict therapeutic targets of diseases, to search for drug candidate molecules, and to design new drug chemical structures, by integrating various biomedical data on compounds (e.g., chemical structures, bioactivity) and diseases (e.g., disease-causing genes, gene expression profiles). In the symposium, we will show some applications to therapeutic target identification, combination therapy, and drug design.

Prof. Tomoyuki Miyao (NAIST)

“Global Interpretation of Regression Models for Quantitative Structure-Property Relationship”

Global interpretation of regression models can provide a comprehensive understanding of the relationship between molecular descriptors and a target property/activity within a given data set. Multiple linear regression models are self-explainable although correlation among descriptors should be carefully monitored. Feature importance in a black-box model is frequently calculated while it cannot tell how a feature is important in the model. In this lecture, methods for making interpretable regression models are reviewed.

Prof. Alexandre Varnek (Strasbourg University)

”Molecular Cartography Approach to Chemical Space Exploration”

This lecture describes application of the molecular cartography approach to various chemoinformatics tasks: (i) chemical data visualization and analysis, (ii) prediction of properties or biological activities, (iii) comparison of (ultra) large chemical databases, (iv) target identification, and (v) virtual screening. In combination with a SMILES- or graph-based autoencoder, cartography can efficiently be used for automatized generation of chemical structures with desired biological activities and novel chemical transformations.

Prof. Kenji Hori (TS Technology)

A New Platform for Functional Chemicals Manufacturing Processes with Data Driven Chemistry”

Although materials informatics offers candidates for functional chemicals, it is generally very difficult to synthesize them. We have been developing a new platform connecting computational chemistry with synthesis route development systems such as AIPHOS/TOSP in order to give a solution to the problem. Moreover, we have to confirm in experiments that synthesis target molecules are produced by using the routes developed. It is also important to develop their manufacturing process. In the present study, we will present an overview of the national project for functional chemicals manufacturing processes with data driven chemistry.

Prof. Naoaki Ono (NAIST)

“Classification of natural compounds using molecular hypergraph grammar”

Molecular hypergraph grammar (MHG) is a powerful model to represent chemical structures using finite symbols. In this study, we constructed a neural network that encodes chemical structures in a latent space using unsupervised training based on autoencoder, and classified the 19,769 molecules in the KNApSAcK database into 73 groups according to their metabolic biosynthesis pathways.
The results showed the features extracted using MHG can represent the constraints of the metabolic pathways of biological reactions.

Prof. Shigehiko Kanaya (NAIST)

“Data Science toward understanding of QSAR and biosynthetic pathways for secondary metabolites”

There are multifaceted purposes of creating databases including natural product databases. A natural product database is supportive of versatile researches in metabolomics e.g. assignment of peaks of mass spectra (MS) to metabolites. In general, databases can be utilized for knowledge discovery from the systematization or mining of the accumulated data. To attained those purposes, both creation of databases and development of mining techniques are essential in different scientific fields. KNApSAcK Family DB (http://www.knapsackfamily.com/KNApSAcK_Family/) is a set of databases associated with natural products and organisms. Currently, KNApSAcK Core DB consists of 157,144 metabolite-species pairs encompassing 62,646 metabolites and 24,704 species according to the last update in 2022. It is predicted that the set of secondary metabolites synthesized by plants, fungi and microorganisms includes approximately 3000, terpenoids, 9000 flavonoids, 1600 isoflavonoids and 12,000 alkaloids. The number of metabolites included in KNApSAcK Core DB is much more than the predicted number of metabolites.
In the symposium, we briefly discuss major databases under KNApSAcK family DB and application of KNApSaCK DB such as QSAR and classification of metabolites based on alkaloid metabolic pathways based of deep learning such as Graph Convolution Neural Networks (GCNN).