Focus
The research group Data Mining and Machine Learning investigates novel approaches to exploratory data analysis, unsupervised, semi-supervised and supervised learning. We focus on methods for various data types including texts, graphs, high-dimensional feature vectors and other complex structures. Our team considers different tasks, e.g., representation learning, embedding, clustering, causality detection, classification and reinforcement learning.
Our methods are inspired by challenges arising from different application areas, e.g. medicine, neuroscience, pharmacoinformatics, renewable energies and social sciences.
The unit's research is conducted by seven work groups and reflected in our projects, publications, and the source code of selected publications.
Database Techniques for Data Mining
Group Leader: Assoz. Prof. Dr. Christian Böhm
We advance scalable, trustworthy analytics by integrating modern data mining directly into database and data‑intensive systems. Our research spans similarity search and high‑dimensional indexing to clustering, outlier detection, approximate query processing, and adaptive/learned indexing - always with an eye to interactive performance on large, heterogeneous datasets and distributed platforms. Through open‑source prototypes, cross‑domain collaborations, and active community service, we shape next‑generation data platforms that invite students and researchers to push analytics closer to the data.
Data Mining
Group Leader: Univ.-Prof. Dr. Claudia Plant
We have unprecedented opportunities to find profound answers to complex questions, e.g., "what is the best way to get to the airport?" Or: "is this drug suitable for me?" However, having more data not automatically means gaining more knowledge.
Machine Learning with Graphs
Group Leader: Assoz. Prof. Nils M. Kriege
We design principled methods for learning on structured data, connecting machine learning, graph theory, and algorithmics. Our research focuses on graph embedding, matching and search, ranging from broadly applicable techniques to specialized approaches for challenging problems in cheminformatics and bioinformatics.
Natural Language Processing
Group Leaders:
We explore how machines can understand, reason over, and produce human language by fusing probabilistic learning with linguistic insights. Our work spans information extraction, question answering, semantic parsing, summarization, and classification, with a strong emphasis on multilingual and low‑resource scenarios, robustness, interpretability, and efficient training. We build open, well‑documented resources and tools, partner across disciplines, and translate NLP research into real‑world applications from digital humanities to social and biomedical text.
We investigate how to represent and compute meaning in natural language, linking AI methods with formal and corpus-based approaches to semantics. Our work covers semantic parsing and meaning representations, discourse and coreference, textual inference and commonsense reasoning, and grounded or multimodal language understanding, with attention to robustness, interpretability, and data-efficient learning across languages and domains. We release carefully curated resources and open-source implementations, engage in cross-disciplinary collaborations, and build deployable NLP systems that connect linguistic theory with practical applications.
Probabilistic and Interactive Machine Learning
Group Leader: Assoz. Prof. Dr. Sebastian Tschiatschek
We develop probabilistic models and interactive learning algorithms that make AI systems uncertainty‑aware, data‑efficient, and aligned with human goals. Our work spans Bayesian inference and structured priors to active learning, preference elicitation, contextual bandits, and reinforcement learning - linking principled decision‑making under uncertainty with human‑in‑the‑loop experimentation and interface design. By combining theoretical advances with reproducible code and real‑world studies alongside domain experts, we deliver deployable systems for recommendation, healthcare, and adaptive automation.
Responsible Machine Learning
Group Leader: Ass.-Prof. Dr. Martin Pawelczyk
We build accountable and trustworthy ML systems by uniting causal reasoning with interpretability and robustness, so model decisions are explainable and can be improved by concrete actions. We study algorithmic recourse and counterfactuals, fairness auditing and mitigation, distribution shift and uncertainty, and human‑centered evaluation - developing methods that make predictions equitable and reliable in practice. Together with partners, we maintain open benchmarks and toolkits, run user and field studies, and help turn responsible‑ML theory into deployable solutions and policy guidance.
Scalable Algorithms for Graph Mining
Group Leader: Ass.-Prof. Dr. Yllka Velaj
Our research develops scalable algorithms for analyzing, learning from, and optimizing large graph-structured data. We use graphs to model complex relational systems in domains such as social networks, recommender systems, transportation, neuroscience, economics, biology, and the Web.
We focus on graph mining problems related to structure, similarity, diffusion, clustering, classification, and representation learning. This includes identifying important nodes and links, optimizing centrality measures, modeling information diffusion, recommending links, labelling nodes, and discovering communities in large networks.
Our work combines graph algorithms, data mining, machine learning, optimization, and network science to build efficient and interpretable tools for complex relational data.