EDBT2012: An Adaptive Online Algorithm for Time Series Segmentation with Error Bound Guarantee

Published in The 15th International Conference on Extending Database Technology (EDBT), (CCF Rank B, Acceptance rate: 22.5%), 2012

Authors: Zhenghua Xu, Rui Zhang, Ramamohanarao Kotagiri and Udaya Parampalli.
Abstract: The volume of time series data grows rapidly in various applications such as network traffic management, telecommunications, finance and sensor network. To reduce the cost of storage, transmission and processing of time series data, the need for more compact representations of time series data is compelling. Segmentation is one of the most commonly used methods to meet this requirement. Both PLA and PPA are common segmentation methods which divide a time series into segments and use a linear function or a polynomial function to approximate each segment, respectively. However, while most of the current PLA and PPA methods aim to minimize the holistic error between the approximation and the original time series, few works try to represent time series as compact as possible with an error bound guarantee on each data point. Furthermore, in many real world situations, the patterns of the time series do not follow a constant rule such that using only one type of functions may not yield the best compaction. Motivated by these observations, we propose an online segmentation algorithm which approximates time series by a set of different types of candidate functions (polynomials of different orders, exponential functions, etc.) and adaptively chooses the most compact one as the pattern of the time series changes. A challenge in this approach is to determine the approximation function on the fly (“online”). Thereby, we further propose a novel method to efficiently generate the compact approximation of a time series in an online fashion for several types of candidate functions. This method incrementally narrows the feasible coefficient spaces of candidate functions in coefficient coordinate systems such that it can make each segment as long as possible given an error bound on each data point. Extensive experimental results show that our algorithm generates more compact approximations of the time series with lower average errors than the state-of-the-art algorithm.

[Download paper here]