[人工知能セキュリティ・プライバシーチーム] SafePickle: Pickleベースのファイル脅威の現状と、Pickleデシリアライゼーション攻撃に対する多層的な防御の徹底分析
Abstract
Model sharing platforms such as HuggingFace have become central to machine learning development and deployment. However, many shared models, particularly PyTorch-based ones, are serialized using Python’s Pickle protocol—a format known to allow remote code execution upon deserialization. Despite this well-documented risk, Pickle remains prevalent, underscoring the need for effective mitigation techniques.
This paper provides a systematic security analysis of Pickle-based threats in large-scale AI model repositories. We identify two major deficiencies in existing defenses:
Static Parsing Limitations
Common static parsers fail to recognize malformed or obfuscated Pickle files, leaving many samples unanalyzed. We develop a robust parsing approach that increases successful extraction rates from 51% to 85% on real-world samples from HuggingFace.Ineffective Scanning Heuristics
Current scanners rely on API-level whitelists or blacklists, resulting in a high false-positive rate, where over 98% of scanned files marked as unsafe are most likely safe. To address this, we propose a semantic analysis framework that leverages Large Language Models (LLMs) to classify Pickle behavior. Our method achieves 96% accuracy in detecting malicious files and reduces the false positives to 3% on flagged samples.
We propose generic methodologies to cover the identified gaps, and we advocate a multi-layered approach that combines them for maximum security.
Our findings demonstrate that current defenses are too coarse and that integrating semantic classifiers can significantly improve security without sacrificing usability. We release our tools and dataset to foster further research in secure model serialization.
About the Speaker
Daniel Gilkarov is a PhD student at Ariel University, specializing in AI security and machine learning defenses. His research explores innovative approaches to steganalysis, malware detection in AI model weights, and zero-trust architectures, with a focus on protecting models from hidden threats.