Critical SGLang Vulnerability Allows Remote Code Execution via Malicious AI Model Files

 

A newly disclosed high-severity flaw in SGLang could enable attackers to remotely execute code on affected servers through specially crafted AI model files.

The issue, tracked as CVE-2026-5760, has received a CVSS score of 9.8 out of 10, placing it in the critical category. Security analysts have identified it as a command injection weakness that allows arbitrary code execution.

SGLang is an open-source framework built to efficiently run large language and multimodal models. Its popularity is reflected in its development activity, with more than 5,500 forks and over 26,000 stars on its public repository.

According to the CERT Coordination Center, the flaw affects the “/v1/rerank” endpoint. An attacker can exploit this functionality to run malicious code within the context of the SGLang service by using a specially designed GPT-Generated Unified Format (GGUF) model file.

The attack relies on embedding a malicious payload inside the tokenizer.chat_template parameter of the model file. This payload uses a server-side template injection technique through the Jinja2 templating engine and includes a specific trigger phrase that activates the vulnerable execution path.

Once the victim downloads and loads the model, often from repositories such as Hugging Face, the risk becomes active. When a request reaches the “/v1/rerank” endpoint, SGLang processes the chat template using its templating engine. At that moment, the injected payload is executed, allowing the attacker to run arbitrary Python code on the server and achieve remote code execution.

Security researcher Stuart Beck traced the root cause to unsafe template handling. Specifically, the framework uses a standard Jinja2 environment instead of a sandboxed configuration. Without isolation controls, untrusted templates can execute system-level code during rendering.

The attack unfolds in a defined sequence: a malicious GGUF model is created with an embedded payload; it includes a trigger phrase tied to the Qwen3 reranker logic located in “entrypoints/openai/serving_rerank.py”; the victim loads the model; a request hits the rerank endpoint; and the template is rendered using an unsafe environment, leading to execution of attacker-controlled Python code.

This vulnerability falls into the same class as earlier issues such as CVE-2024-34359, a critical flaw in llama_cpp_python, and CVE-2025-61620, which affected another model-serving system. These cases highlight a recurring pattern where unsafe template or model handling introduces execution risks.

To mitigate the issue, CERT/CC recommends replacing the current template engine configuration with a sandboxed alternative such as ImmutableSandboxedEnvironment. This would prevent execution of arbitrary Python code during template rendering. At the time of disclosure, no confirmed patch or vendor response had been issued.

From a broader security lens, this incident reinforces a growing concern in AI infrastructure. Model files are increasingly being treated as trusted inputs, despite their ability to carry executable logic. As adoption expands, organizations must validate external models, restrict execution environments, and continuously monitor inference systems to reduce the risk of compromise.

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article: