Theis by N.-.M. Aliman MSc

PhD defence: Hybrid Cognitive-Affective Strategies for AI Safety

Artificial intelligence (AI) can have great beneficial impacts on society. For example as a tool for more efficient policy making and law enforcement or as a tool promoting human well-being. However, AI also poses substantial risks, like the failure to align it with human ethical values or the empowerment (cyber-)crimes and much more. These risks are addressed in the recently emerged AI safety field. This theoretical and analytical thesis provides an in-depth transdisciplinary examination to understand how to  address AI risks with the aid of scientifically grounded, hybrid cognitive-affective strategies.

The identified strategies are hybrid as for a human-centered approach to this broad issue, AI systems cannot be analysed in isolation. The nature of human entities and the properties of human-machine interactions have to be considered within a socio-technological framework, taken account of the inherently affective nature of human cognition. Utilizing a cyber-security oriented approach, considering not only unintentional failures but also intentional malice, the thesis identifies short-term and long-term strategies and covers AI governance as well as AI engineering requirements.

The thesis considers two kinds of systems: not-yet existing hypothetical AI systems that are able to consciously create and understand explanatory knowledge (Type II), and the current AI systems that cannot (Type I). It analyses the meaningful control of Type I AI systems in detail taking the use case of autonomous vehicles as an example, but also touches on Type II systems. The thesis also introduces the AI safety paradox, which states that value alignment and control represent conjugate requirements in AI safety. Whilst the value alignment problem addresses the question of how to build AI systems that are aligned with human ethical values, the control problem is how to implement AI systems that will not harm humans.

