66
IRUS Total
Downloads
  Altmetric

Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

File Description SizeFormat 
OBrien-S-2023-PhD-Thesis.pdfThesis10.34 MBAdobe PDFView/Open
Title: Computational acquisition of knowledge in small-data environments: a case study in the field of energetics
Authors: O'Brien, Sinead
Item Type: Thesis or dissertation
Abstract: The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.
Content Version: Open Access
Issue Date: Sep-2022
Date Awarded: Feb-2023
URI: http://hdl.handle.net/10044/1/101917
DOI: https://doi.org/10.25560/101917
Copyright Statement: Creative Commons Attribution NonCommercial NoDerivatives Licence
Supervisor: Proud, William
Department: Physics
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Physics PhD theses



This item is licensed under a Creative Commons License Creative Commons