Optical inspection through visual anomaly detection and segmentation is essential for identifying defective products and precisely locating anomalous regions. Recent methods, such as CLIP [1] and DINO [2], leverage prior knowledge to enhance generalization across a wide range of products. For instance, CLIP-based approaches like [3] adapt its inherent knowledge by defining text prompts for normal and abnormal states, which typically results in binary anomaly classification.
This project invites master students to extend these techniques by developing a method for multi-class anomaly detection. The goal is to design and implement a system that learns text prompt embeddings to better align with image embeddings, thereby accurately detecting and differentiating multiple types of anomalies. The initial concept is outlined, and students will be responsible for its implementation and evaluation on real-world datasets.
Ideal candidates should have a strong foundation in machine learning principles, proficiency in Python, and basic knowledge with deep learning frameworks.
[1] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
[2] Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, and Armand ´ Joulin. Emerging properties in self-supervised vision transformers. In ICCV, pages 9630–9640. IEEE, 2021.
[3] Qihang Zhou, Guansong Pang, Yu Tian, Shibo He, and Jiming Chen. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In ICLR. 2024.
Supervisors: Ylli Sadikaj, Claudia Plant
Contact: Ylli Sadikaj