to

PhD defence of G. Kapidis MSc

PhD defence: A Modular Approach for the and Interconnection of Objects, Hands, Locations, and Actions for Egocentric Video Understanding

The topic of this dissertation is the analysis and understanding of egocentric (first-person) videos with respect to the performed human actions of the camera wearer, in a structured and automatic manner. Perhaps, the most identifying characteristic of the egocentric perspective is that it provides an information-rich view of the scene that the person holding the camera experiences. The resulting scenes are often indicative of the location of the persons and the activities they undertake. Recognition is based on high-level information, such as the hands of the camera wearer and the objects that are being manipulated, as well as low-level features made available through data-learning methods.

In this thesis, we use deep convolutional neural networks trained on egocentric images, video segments, and/or (a)synchronously acquired high-level features of the scene as the backbone of action classification models. We demonstrate that the training process and architecture of the models is detrimental to their success; a topic largely investigated with the application of multitask learning, measuring the effect of a variety of learnable outputs to the final action recognition result. We additionally pursued the combination of video data from a variety of sources simultaneously. In the context of the thesis, it is called multi-dataset multitask learning and refers to a novel way to combine related and unrelated data sources to improve egocentric action recognition quality.

Start date and time
End date and time
Location
Academiegebouw, Domplein 29 & online (link)
PhD candidate
G. Kapidis MSc
Dissertation
A Modular Approach for the Detection and Interconnection of Objects, Hands, Locations, and Actions for Egocentric Video Understanding
PhD supervisor(s)
prof. dr. R.C. Veltkamp
Co-supervisor(s)
dr. ir. R.W. Poppe