Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Computing
  4. Computing PhD theses
  5. Leveraging social media data for detection and monitoring of depression
 
  • Details
Leveraging social media data for detection and monitoring of depression
File(s)
Alhamed-F-2025-PhD-Thesis.pdf (23.95 MB)
Thesis
Author(s)
Alhamed, Falwah Abdulaziz
Type
Thesis or dissertation
Abstract
Mental health disorders are increasingly prevalent, with depression being the most common and a major cause of disability and suicide worldwide. Understanding its symptoms, severity, and progression is vital for improving early detection and intervention. This thesis adopts a data-driven AI approach, constructing a large expert-annotated dataset and developing models for monitoring depression from social media language.

We first design a data collection and curation framework to build a large-scale dataset of posts from individuals who self-report depression. In collaboration with psychiatrists and psychologists, we create an annotation scheme for labelling symptoms and severity over time. Experienced psychologists annotate the data, resulting in DepSy, the largest English dataset of 40,000 posts fully annotated for depression symptoms and severity progression. This dataset underpins all subsequent experiments.

We then benchmark multiple NLP approaches to classify posts written before versus after a reported depression diagnosis. Analyses include linguistic patterns, emotion usage, and content variation. Among various models tested, BERT-based classifiers achieve the best overall performance, while large language models (LLMs) in zero-shot settings perform near random.

Next, we address symptom detection as a multi-label classification problem. A bespoke BERT-based model achieves strong overall results, while a fine-tuned Llama-based model, DepSy-LLaMA, obtains higher recall, identifying more positive symptom cases—a valuable property in mental health detection. However, LLM predictions remain less reliable for sensitive applications.

Finally, we explore depression severity prediction over time using deep learning and propose a hybrid CTMC-LSTM model that integrates Markov chains with LSTM to capture temporal patterns. This model uniquely detects severe cases and achieves the highest performance across all baselines. The findings demonstrate the importance of temporal modelling and expert-annotated data for building robust, ethical, and clinically informed systems for depression monitoring from social media.
Version
Open Access
Date Issued
2025-05-01
Date Awarded
2025-11-01
URI
https://hdl.handle.net/10044/1/125096
DOI
https://doi.org/10.25560/125096
Copyright Statement
Attribution-NonCommercial 4.0 International Licence (CC BY-NC)
License URL
https://creativecommons.org/licenses/by/4.0/
Advisor
Ive, Julia
Specia, Lucia
Sponsor
Saudi Arabia Cultural Bureau (Great Britain)
Publisher Department
Department of Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback