8
IRUS Total
Downloads
  Altmetric

Principled bayesian modeling and statistical learning with non-representative data in astrophysics

File Description SizeFormat 
Autenrieth-M-2023-PhD-Thesis.pdfThesis19.74 MBAdobe PDFView/Open
Title: Principled bayesian modeling and statistical learning with non-representative data in astrophysics
Authors: Autenrieth, Maximilian
Item Type: Thesis or dissertation
Abstract: This thesis tackles the fundamental issue of non-representative data in astrophysics via the development and application of methodology within the areas of statistical machine learning, Bayesian statistics and causal inference; to efficiently handle big data, to allow for probabilistic and principled parameter estimation with proper uncertainty quantification, and to deal with systematic uncertainties and biases in the data collection process. To enable (a) statistically principled, (b) scientifically justified, and (c) computationally efficient analysis of non-representative, complex astrophysical data, this thesis provides novel general-purpose statistical methodology, and statistical methodology tailored to topical scientific problems, in cosmology and high-energy astrophysics, as grouped into three related projects hereafter: (i) We propose a simple, statistically principled, and theoretically justified general-purpose method, StratLearn, to improve supervised learning when the training set is not representative, a situation known as covariate shift. Building upon a well-established methodology in causal inference, we show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. We demonstrate that fitting learners within strata constructed on the estimated propensity scores improves upon state-of-the-art importance weighting methods on two topical scientific tasks – conditional density estimation of galaxy redshift (photo-z), and photometric supernovae type Ia (SNIa) classification. (ii) We improve weak lensing photo-z calibration via Bayesian hierarchical modeling of full galaxy photo-z conditional density estimates obtained within StratLearn. We substantially improve the galaxy tomographic bin assignment, and obtain almost unbiased estimates of target population means within tomographic bins. (iii) We propose a science-driven hierarchical Bayesian framework to estimate the galaxy luminosity distribution in X-rays, combining non-representative X-ray and optical surveys. Our proposed framework accounts for incompleteness bias by incorporating an X-ray incompleteness function (estimated from simulations) and an optical incompleteness function (with parameters learned from the observed data) into the model. This allows for improved recovery of the luminosity function even with high proportions of systematic incompleteness, evaluated on simulations, and applied to data from the Chandra Deep Field South (CDFS).
Content Version: Open Access
Issue Date: Aug-2023
Date Awarded: Nov-2023
URI: http://hdl.handle.net/10044/1/108121
DOI: https://doi.org/10.25560/108121
Copyright Statement: Creative Commons Attribution NonCommercial Licence
Supervisor: van Dyk, David A.
Trotta, Roberto
Stenning, David C.
Sponsor/Funder: Imperial College London
Department: Mathematics
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Mathematics PhD theses



This item is licensed under a Creative Commons License Creative Commons