RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London
File(s)Bogaert2024.pdf (10.92 MB)
Published version
Author(s)
Bogaert, Martin
Mouritzen, Christian
Johnson, Matthew S
van Reeuwijk, Maarten
Type
Journal Article
Abstract
High-density low-cost air quality sensor networks are a promising technology to monitor air quality at high
temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to
best leverage this information, particularly given the lower data quality coming from the sensors. Here we report
on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently
deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of
Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a
decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal
decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant
spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data’s variance,
achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to
identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network,
the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot
event at these at-risk locations exceeds the mean NO2concentration by 33 μg/m3. We show the potential of RPCA
for signal correction; it corrects random errors yielding a reference signal with R2 > 0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R2 = 0.72 from the rest of the sensor network, an
improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use
temporarily.
temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to
best leverage this information, particularly given the lower data quality coming from the sensors. Here we report
on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently
deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of
Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a
decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal
decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant
spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data’s variance,
achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to
identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network,
the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot
event at these at-risk locations exceeds the mean NO2concentration by 33 μg/m3. We show the potential of RPCA
for signal correction; it corrects random errors yielding a reference signal with R2 > 0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R2 = 0.72 from the rest of the sensor network, an
improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use
temporarily.
Date Issued
2024-05-15
Date Acceptance
2024-03-04
Citation
Science of the Total Environment, 2024, 925
ISSN
0048-9697
Publisher
Elsevier
Journal / Book Title
Science of the Total Environment
Volume
925
Copyright Statement
© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
License URL
Identifier
https://www.sciencedirect.com/science/article/pii/S0048969724016632
Subjects
Air Quality Monitoring
AIR-QUALITY
AMBIENT
CALIBRATION
Environmental Sciences
Environmental Sciences & Ecology
Hotspot identification and signal correction
Life Sciences & Biomedicine
Low-Cost Sensor Networks
MONITORING NETWORK
PARTICULATE MATTER
POLLUTION
Robust Principal Component Analysis (RPCA)
Science & Technology
Spatial and Temporal Patterns in Air Pollution
Publication Status
Published
Article Number
171522
Date Publish Online
2024-03-15