Details about my PhD (research, thesis, its evaluation, and its impact)

Inclusion of Symbolic Domain-Knowledge into Deep Neural Networks

Tirtharaj Dash, BITS Pilani, Goa Campus

Superviser: Ashwin Srinivasan, Senior Professor, BITS Pilani, Goa Campus
Co-superviser: Sukanta Mondal, Associate Professor, BITS Pilani, Goa Campus


This thesis is concerned with techniques for inclusion of domain-knowledge into Deep Neural Networks (DNNs). We are primarily concerned with real-world scientific problems with the following characteristics: (a) Data are naturally graph-structured (relational), (b) The amount of data available is typically small, and (c) There is significant domain-knowledge, usually expressed in some logical form (rules, taxonomies, constraints and the like). Broadly, there are 3 different ways in which the domain-knowledge can be incorporated into a DNN: by changing the input representation, by changing the loss function, or by changing the model (structure and parameters). We propose techniques for the inclusion of domain-knowledge into DNNs that change the input representation. In particular, our principal contributions are as follows: (1) We study the inclusion of complex domain-knowledge into Multilayer Perceptrons (MLPs) using relational features and propositionalisation. We propose a utility-based stochastic sampling technique for drawing features from a large but countable space of relational features; (2) We propose a simplified technique called `vertex-enrichment’ for incorporating symbolic domain knowledge into deep neural networks that deal with graph-structured data, known as graph neural networks (GNNs); (3) We propose a systematic technique to incorporate symbolic domain-knowledge into GNNs using the method of inverse entailment available in Inductive Logic Programming (ILP); and (4) We construct a sequence generation system using a modular combination of two deep generative models and a discriminator model based on (3), and use this system for a problem of early-stage lead discovery in drug design. Our implementations are techniques that combine neural networks and symbolic representations, resulting in new neuro-symbolic models, such as: Deep Relational Machines (DRMs), Vertex-Enriched Graph Neural Networks (VEGNNs), Bottom-Graph Neural Networks (BotGNNs), and a modular end-to-end neuro-symbolic system for the generation of novel molecules for drug design. Our primary hypothesis is that inclusion of domain-knowledge can significantly improve the performance of a deep neural network. We conduct large-scale empirical testing of our hypothesis, using nearly 75 datasets in the broad area of drug discovery that consist of over 200,000 relational data instances and with domain-knowledge containing about 100 relations. In all cases, our empirical evidence supports the primary hypothesis and encourages the inclusion of domain-knowledge into deep neural networks for prediction and explanation.


The final thesis is here, and the slide deck of my defence seminar is here. Shodhganga page of the thesis can be accessed here.


I sincerely thank the thesis examiners for their thorough evaluation of my thesis and gratefully acknowledge them for their time devoted to my PhD defence:

Sriraam Natarajan, University of Texas at Dallas
Filip Zelezny, Czech Technical University in Prague

Some excerpts from the reviews:
  • “I truly like this fantastic thesis. It is original, addresses several important tasks, and moves the applicability of the field of … into new directions rarely seen before by the field.”

  • “… the techniques are adapted to a drug design problem and the results are both intuitive and compelling.”

  • “Overall, an excellent thesis that addresses some of the most interesting problems in AI …”

  • “The thesis addresses one of the most important problems identified in modern AI …”

  • “The main hypothesis of the work - … - is repeatedly confirmed by extensive experiments throughout the thesis.”

  • “The structure of the thesis is neat and logical. …”

  • “In general, the author has made significant contributions to the problem, … to the field of neural-symbolic integration and relational learning.”


I won the Best PhD Thesis Award from my university.