catch22: CAnonical time-series CHaracteristics
File(s)Lubba2019_Article_Catch22CAnonicalTime-seriesCHa.pdf (1.54 MB)
Published version
Author(s)
Type
Journal Article
Abstract
Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.
Date Issued
2019-11-01
Date Acceptance
2019-08-01
Citation
Data Mining and Knowledge Discovery, 2019, 33 (6), pp.1821-1852
ISSN
1384-5810
Publisher
Springer Science and Business Media LLC
Start Page
1821
End Page
1852
Journal / Book Title
Data Mining and Knowledge Discovery
Volume
33
Issue
6
Copyright Statement
© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Sponsor
GlaxoSmithKline Services Unlimited
Engineering & Physical Science Research Council (E
Natural Environment Research Council (NERC)
Engineering & Physical Science Research Council (EPSRC)
Rosetrees Trust
Natural Environment Research Council [2006-2012]
Grant Number
3000551036
EP/K503733/1
NE/L012456/1
EP/N014529/1
A1173/ M577
NE/L002515/1
Subjects
Science & Technology
Technology
Computer Science, Artificial Intelligence
Computer Science, Information Systems
Computer Science
Time series
Classification
Clustering
Time-series features
Artificial Intelligence & Image Processing
0801 Artificial Intelligence and Image Processing
0804 Data Format
0806 Information Systems
Publication Status
Published
Date Publish Online
2019-08-09