vLLM: Security Check Bypass via assert Statement in Activation Function Loading Allows Arbitrary Code Execution
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, an assert-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode (python -O or PYTHONOPTIMIZE=1). This vulnerability is fixed in 0.22.0.
2. vllm: vLLM: Arbitrary code execution via malicious HuggingFace model
A flaw was found in vLLM, an inference and serving engine for large language models (LLMs). An unauthenticated attacker can exploit an assert-based security check during activation function loading. By publishing a malicious HuggingFace model, an attacker can achieve arbitrary code execution on the server when vLLM runs in Python optimized mode.
Avoid running vLLM with python -O or PYTHONOPTIMIZE=1 until updated packages are available. Only load models from trusted sources. Restrict who can deploy or update models on inference endpoints. Apply network access controls and authentication in front of vLLM APIs.