My Google Summer of Code Experience

Integrating Vision Large Language Models into Anomalib

During my time as a Software Engineer - Machine Learning in the Google Summer of Code program (May 2024 - Aug 2024), I worked on an exciting project with OpenVINO's Anomalib. My main focus was on integrating Vision Large Language Models (VLLMs) into the Anomalib framework to achieve Zero/Few Shot anomaly detection models.

Project Goals

Key Achievements

Challenges and Learning

Throughout the project, I encountered specific challenges related to model performance. The OpenAI ChatGPT model worked effectively out of the box, delivering satisfactory results. However, the open-source models presented significant difficulties—they were not trained to handle multiple images effectively or did not perform well in the tasks required. These challenges highlighted the limitations of current open-source models in comparison to proprietary solutions and underscored the importance of continued development and training for such models.

Code and Documentation

You can find the code I worked on during this project at the following repositories:

Code Not Merged

During the project, several models were explored but ultimately not merged due to performance issues. Below are the details of these models and the reasons they were not integrated into the main branch:

What's Left to Do

While significant progress has been made, there are still several tasks left to complete:

Overall, my Google Summer of Code experience was incredibly rewarding, allowing me to contribute to cutting-edge machine learning technology while collaborating with talented developers from around the world.