Event modelling and recognition in video
File(s)
Author(s)
Gkalelis, Nikolaos
Type
Thesis or dissertation
Abstract
The management of digital video has become a very challenging problem as the amount of video content continues to witness phenomenal growth. This trend necessitates the development of advanced techniques for the efficient and effective manipulation of video information. However, the performance of current video processing tools has not yet reached the required satisfaction levels mainly due to the gap between the computer generated semantic descriptions of video content and the interpretations of the same content by humans, a discrepancy commonly referred to as the semantic gap. Inspired from recent studies in neuroscience suggesting that humans remember real life using past experience structured in events, in this thesis we investigate the use of appropriate models and machine learning approaches for representing and recognizing events in video. Specifically, a joint content-event model is proposed for describing video content (e.g., shots, scenes, etc.), as well as real-life events (e.g., demonstration, birthday party, etc.) and their key semantic entities (participants, location, etc.). In the core of this model stands a referencing mechanism which utilizes a set of video analysis algorithms for the automatic generation of event model instances and their enrichment with semantic information extracted from the video content. In particular, a set of subclass discriminant analysis and support vector machine methods for handling data nonlinearities and addressing several limitations of the current state-of-the-art approaches are proposed. These approaches are evaluated using several publicly available benchmarks particularly suited for testing the robustness and reliability of nonlinear classification methods, such as the facial image collection of the Four Face database, datasets from the UCI repository, and other. Moreover, the most efficient of the proposed methods are additionally evaluated using a large-scale video collection, consisting of the datasets provided in TRECVID multimedia event detection (MED) track of 2010 and 2011, which are among the most challenging in this field, for the tasks of event detection and event recounting. This experiment is designed in such a manner so that it can be conceived as a fundamental evaluation of the proposed joint content-event model.
Version
Open Access
Date Issued
2013-06
Date Awarded
2014-02
Advisor
Stathaki, Tania
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)