Data Mining

Many applications, e.g., in biomedicine, the web and sensor networks generate tremendous amounts of data. We have unprecedented opportunities to find profound answers to complex questions, e.g., “what is the best way to get to the airport?” Or: “is this drug suitable for me?” However, having more data not automatically means gaining more knowledge.
To exploit the opportunities in Big Data, we need intelligent and efficient algorithms translating the information in data into understandable knowledge. The research group Data Mining headed by » Prof. Dr. Claudia Plant investigates methods comprehensively supporting the process of knowledge discovery from Big Data.

In current research we mainly focus on information-theoretic methods. In order to make the information in data measurable, we link data mining to data compression. If data contains non-random structure like dependencies or other patterns we can find them with a data mining algorithm. We use the gained knowledge about the found patterns to compress our data. The compression rate is a very general quality measure for data mining.
Based on this idea we focus on three central aspects:

  • Generalization of Methods.
    We are not only confronted with an ever increasing data flood. Also the research field of data mining is highly dynamic proposing sophisticated novel algorithms every day. This is not only a blessing but also a curse. It is very difficult for the user to choose among thousands of opportunities a suitable algorithm for the problem at hand. Therefore, we investigate methods which generalize different basic approaches to data mining. We aim at designing algorithms which support detecting any kind of pattern in a data-driven way.
  • Integration of Data.
    Despite of the large variety of existing data mining techniques, most of them are limited to support a certain input data type, e.g., feature vectors, categorical data or graphs. Applications typically produce a variety of data of heterogeneous sources and types, e.g., a graph with additional numerical and categorical attributes characterizing the nodes and the edges. We investigate methods discovering interesting relationships between different data sources. Of particular interest are novel types of patterns which emerge by integrating data, e.g., patterns in the attributes which influence the link pattern in the graph.
  • High-performance Data Mining on Modern Hardware.
    To enable knowledge discovery from massive data we exploit the computing power of modern hardware. We develop massively parallel data mining algorithms for multi-core architectures and graphics processing units. We also investigate index structures to speed up essential building blocks of our algorithms, e.g., K-nearest neighbor queries.

Our vision is the Knowledge Factory.

Generalization and integration support the discovery of interesting knowledge from heterogeneous data. We develop and evaluate our methods in the context of interdisciplinary projects with experts from neuroscience, biology and environmental science. We teach the basics of data mining together with practice-based exercises. In advanced courses we discuss recently published approaches. We offer students research-oriented topics for Bachelors and Masters Theses.

We are part of the research platform Data Science @ Uni Vienna