Clean-CLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
DOI:
https://doi.org/10.62643/ijerst.2026.v22.n2.pp159-170Keywords:
Data Poisoning Detection, Multimodal Contrastive Learning, CLIP, Backdoor Attacks, Label Poisoning, Adversarial Machine Learning, Anomaly Detection, Local Outlier Factor, Multimodal SecurityAbstract
Multimodal contrastive learning models such as CLIP (Contrastive Language-Image Pretraining) have achieved remarkable zero-shot generalisation by aligning visual and textual representations through large-scale internet-sourced corpora. This unprecedented reliance on unvetted web data, however, exposes these models to data poisoning attacks—an adversarial paradigm in which carefully engineered image-text pairs are injected into training or fine-tuning pipelines to corrupt learned cross-modal associations. Such attacks can silently implant backdoors, invert semantic relationships, or degrade retrieval accuracy, posing critical risks in security-sensitive domains including medical imaging, autonomous navigation, and content governance. This paper presents Clean-CLIP, a comprehensive detection and mitigation framework that exploits CLIP’s own multimodal encoder to audit image-text datasets before training. The system implements four complementary detection modules: (1) CLIP-alignment-based label poisoning detection using cosine similarity scoring between image embeddings and candidate label prompts; (2) cornerpatch heuristic analysis for backdoor trigger identification; (3) Local Outlier Factor (LOF) anomaly detection in CLIP feature space for semantic perturbation discovery; and (4) cross-modal inconsistency scoring through confidence thresholding. Evaluated on a curated benchmark incorporating BadNets-style triggers, clean-label perturbations, and label-flipping attacks over CIFAR10 subsets, Clean-CLIP achieves a detection precision of 94.3%, recall of 92.7%, and F1-score of 0.935. The framework is delivered as an open-source, cross-platform desktop application with an intuitive graphical interface, enabling researchers and practitioners to validate datasets without specialised adversarial machine learning expertise. Experimental comparisons against Neural Cleanse, Activation Clustering, and STRIP demonstrate that Clean-CLIP uniquely addresses all four attack categories simultaneously while operating without access to the downstream model, achieving superior aggregate performance across all evaluated scenarios.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.













