Mining and similarity search in temporal databases
Kremer, Hardy; Seidl, Thomas (Thesis advisor)
1. Aufl.. - Aachen : Apprimus-Verl. (2013)
Dissertation / PhD Thesis
In: Ergebnisse aus der Informatik 1
Page(s)/Article-Nr.: III, 216, XXVIII S. : Ill., graph. Darst.
Insights from database research, notably in the areas of data mining and similarity search, and advances in storage and microprocessor technology have enabled users to analyze and explore large-scale datasets. Data mining is the task of extracting previously unknown knowledge from data; similarity search encompasses techniques for finding objects similar by content. A prominent kind of data used in these tasks are temporal datasets, which stand out due to their information richness and their many possible applications. This thesis contributes novel, advanced methods for data mining and similarity search on temporal databases. A major challenge in data mining research is the effectiveness of the approaches, corresponding to the quality of extracted patterns. The thesis addresses this challenge for the mining task of temporal clustering. First, a clustering technique is developed that is specifically designed for the requirements of real world time series. Even in difficult settings with various measurement errors and misalignments between time series, it correctly identifies patterns concealed in temporal or dimensional subspaces of the data domain. Second, new methods for the complex task of mapping clusters between clusterings are contributed, for which two applications are investigated: tracing of evolving clusters in spatio-temporal data and the evaluation of clustering results in data stream scenarios. The core of content-based similarity search systems and many data mining tasks are distance functions measuring the similarity between objects. An effective but also computationally expensive distance function for time series is based on adaptive warping on the time axis. This thesis introduces novel methods for queries under time warping. These methods exploit previously unused information in filter-and-refine frameworks for substantial runtime improvements. The anticipatory pruning technique utilizes distance information from a given filter step for rapid rejection of candidates in the refinement step, while the multiple query approach exploits shared characteristics between queries for joint pruning of candidates. The presented approaches are experimentally analyzed and evaluated with respect to competing solutions. Overall, the techniques and results of this thesis represent a major advance in the research areas of data mining and similarity search on temporal data.