8
IRUS TotalDownloads
Altmetric
Principled bayesian modeling and statistical learning with non-representative data in astrophysics
File | Description | Size | Format | |
---|---|---|---|---|
Autenrieth-M-2023-PhD-Thesis.pdf | Thesis | 19.74 MB | Adobe PDF | View/Open |
Title: | Principled bayesian modeling and statistical learning with non-representative data in astrophysics |
Authors: | Autenrieth, Maximilian |
Item Type: | Thesis or dissertation |
Abstract: | This thesis tackles the fundamental issue of non-representative data in astrophysics via the development and application of methodology within the areas of statistical machine learning, Bayesian statistics and causal inference; to efficiently handle big data, to allow for probabilistic and principled parameter estimation with proper uncertainty quantification, and to deal with systematic uncertainties and biases in the data collection process. To enable (a) statistically principled, (b) scientifically justified, and (c) computationally efficient analysis of non-representative, complex astrophysical data, this thesis provides novel general-purpose statistical methodology, and statistical methodology tailored to topical scientific problems, in cosmology and high-energy astrophysics, as grouped into three related projects hereafter: (i) We propose a simple, statistically principled, and theoretically justified general-purpose method, StratLearn, to improve supervised learning when the training set is not representative, a situation known as covariate shift. Building upon a well-established methodology in causal inference, we show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. We demonstrate that fitting learners within strata constructed on the estimated propensity scores improves upon state-of-the-art importance weighting methods on two topical scientific tasks – conditional density estimation of galaxy redshift (photo-z), and photometric supernovae type Ia (SNIa) classification. (ii) We improve weak lensing photo-z calibration via Bayesian hierarchical modeling of full galaxy photo-z conditional density estimates obtained within StratLearn. We substantially improve the galaxy tomographic bin assignment, and obtain almost unbiased estimates of target population means within tomographic bins. (iii) We propose a science-driven hierarchical Bayesian framework to estimate the galaxy luminosity distribution in X-rays, combining non-representative X-ray and optical surveys. Our proposed framework accounts for incompleteness bias by incorporating an X-ray incompleteness function (estimated from simulations) and an optical incompleteness function (with parameters learned from the observed data) into the model. This allows for improved recovery of the luminosity function even with high proportions of systematic incompleteness, evaluated on simulations, and applied to data from the Chandra Deep Field South (CDFS). |
Content Version: | Open Access |
Issue Date: | Aug-2023 |
Date Awarded: | Nov-2023 |
URI: | http://hdl.handle.net/10044/1/108121 |
DOI: | https://doi.org/10.25560/108121 |
Copyright Statement: | Creative Commons Attribution NonCommercial Licence |
Supervisor: | van Dyk, David A. Trotta, Roberto Stenning, David C. |
Sponsor/Funder: | Imperial College London |
Department: | Mathematics |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Mathematics PhD theses |
This item is licensed under a Creative Commons License