PhD defence: Improving sample efficiency of reinforcement learning Exploiting structural knowledge for decision making

to

Reinforcement learning (RL) has achieved remarkable progress in recent years, yet its application in real-world tasks is hindered by poor sample efficiency, especially in structurally complex environments.

This thesis investigates how structural knowledge, including subtask composition, symbolic reasoning, communication structure and agent influence can be exploited to improve the efficiency of single-agent and multi-agent RL algorithms.

First, we introduce a hierarchical RL framework that automatically structures subtasks. By jointly learning high-level subtask selection and low-level subtask execution, the method achieves superior performance in sparse-reward environments. Second, we propose a neuro-symbolic RL framework that integrates probabilistic symbolic reasoning with policy learning. By introducing a probabilistic inference modular to calculate action precondition masks, the framework excludes infeasible actions via symbolic knowledge, thereby improving both sample efficiency and policy safety. Third, we present a multi-agent RL framework that exploits communication structure through decentralized scheduling of sparse communication. Agents learn when to share local messages by predicting others’ messages, leading to improved performance with reduced communication overhead. Finally, we design a multi-agent RL framework, which automatically identifies the state dimensions controllable by each agent. This structural insight enables focused exploration and precise credit assignment in cooperative multi-agent scenarios with sparse rewards.

Together, these contributions advance the sample efficiency of RL by systematically exploiting structural knowledge in decision-making processes. The results across diverse domains demonstrate that the proposed methods outperform state-of-the-art baselines.

Start date and time
End date and time
Location
Hybride: online (livestream link) and for invited guests in the Utrecht University Hall, Domplein 29
PhD candidate
S. Han
Dissertation
Improving sample efficiency of reinforcement learning Exploiting structural knowledge for decision making
PhD supervisor(s)
prof. dr. M.M. Dastani
Co-supervisor(s)
dr. S. Wang
More information
Full text via Utrecht University Repository