Towards human-like visual reasoning: theory and applications of object-centric learning
File(s)
Author(s)
Kori, Avinash
Type
Thesis or dissertation
Abstract
Human perception is inherently object-centric, with objectness the hierarchical structuring of environments from fundamental building blocks forming the foundation of higher level cognition and systematic generalization. Understanding how parts constitute objects, how objects form scenes, and how scenes create an environment is crucial for reasoning and learning. Object-centric representation learning aims to build AI systems that understand and reason about the world through stable, interpretable, and disentangled object representations. However, beyond capturing geometric properties, these representations must encode the causal and functional roles of objects within their environment. This thesis focuses on the identifiability of object-centric representations, an essential property for reliable learning and causal inference. Identifiability guarantees the model convergence to a particular local minima, irrespective of starting point and optimisation strategy. Without identifiability, models risk learning spurious correlations, being affected by confounding factors, selection bias, and other artefacts. To achieve identifiable object-centric representations, this thesis introduces three novel formalisms: 1. CoSA (Unsupervised Conditional Slot Attention): Grounds learned representations to understandable physical entities. 2. PSA (Probabilistic Slot Attention): Focuses on achieving identifiability in object centric representations. 3. VISA (View Invariant Slot Attention): Addresses challenges of partial visibility in object representations. Additionally, this thesis bridges object-centric learning with causal reasoning, providing a theoretical perspective that unifies causal inference and object-centric representations with broad implications for AI research. Beyond these theoretical contributions, in this thesis we also explore practical applications of object-centric learning, particularly in explainable AI (XAI). We introduce Free Argumentative eXchanges (FAX) , a novel methodology that frames model reasoning as contestation between agents. Initially, FAX is applied to a discrete representation framework before being extended to object-centric representations. In summary, this thesis advances the field of object-centric learning and explainability by integrating insights from disentangled representation learning and contributing both foundational theory and practical applications for human-aligned decision-making.
Version
Open Access
Date Issued
2025-05-16
Date Awarded
2025-09-01
Copyright Statement
Attribution 4.0 International Licence (CC BY)
License URL
Advisor
Toni, Francesca
Glocker, Ben
Locatello, Francesco
Sponsor
UK Research and Innovation
Grant Number
EP/S023356/1
Publisher Department
Department of Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)