Synthetic Data Generation for Algorithmic Fairness and Data Cleaning
Our research on the synthetic data generation (SDG) focuses on producing bias free datasets and improving the data quality for performing more accurate analysis and reasoning. Recently, there is an increasing demand on using intelligent systems for taking decisions that could affect peoples' lives. These systems could make discriminative decisions against the minorities or inaccurate decisions due to the poor data quality. The under-representation of specific group in the training data has been identified as a major reason for biased outcomes of the machine learning models. We believe that developing fair and accurate intelligent systems would benefit from using synthetic data generation techniques similar to other data analytics applications (e.g., object detection in image/video data). For that reason, we study different data generation techniques (statistical based or deep learning-based models) and work on developing better models that would generate more realistic data examples. Our work is centered around:
- Bias identification and quantification.
- Bias mitigation algorithms.
- Generating realistic data example (realistic is application dependent).
- Data generation for error detection.
Complex Network Analysis
In our research on network science, we focus on understanding and finding solutions to problems in domains that are characterized by high interconnectivity. In such domains, the structure of the interactions between entities contains a vast amount of information that is not explicitly stated, but that can be revealed through network analysis. To distill this information, we must first model the domain as a network where the nodes are the domain entities of interest and their relations are the edges. The opportunity then is that we can analyze this abstraction with mathematical models. For example, in our current research we analyze network models of financial transactions. Looking at the financial interactions from the perspective of a complex network, we aim to improve machine learning models that detect anomalous financial behaviour such as financial crime. Another research direction is to understand the core mechanisms that lead financial networks to develop the topologies that we observe. The purpose of this research is to enable the generation of synthetic financial networks that can then be used to train models while removing the dependancy on the availability of real world data.
Knowledge graph completion with Symbolic and Subsymbolic Approaches
Knowlege graph completion is the task of learning vector representations of entities and relations for the purpose of link prediction. This task finds applications also in other areas, including node classification, community detection, and question answering. Broadly, there are two main approaches to solving this task: (I) The symbolic approach utilizes a rule-based method where symbols, representing entities and relationships, are used for learning and inference. (II) The subsymbolic approach, on the other hand, learns vector representations of entities and relations in a latent embedding space. This approach scales well but lacks explainability compared to the symbolic approach. We explore a combined approach that incorporates both symbolic and subsymbolic reasoning to address this limitation.