Remote Exploitation Risk Emerges From Ollama Out-of-Bounds Read Flaw

 

Increasing reliance on large language model infrastructure deployed locally has prompted a renewed focus on self-hosted artificial intelligence platforms’ security posture after researchers revealed a critical vulnerability in Ollama that could lead to remote attackers gaining access to sensitive process memory without authorization. 
CVE-2026-7482, a security vulnerability with a CVSS severity score of 9,1 describes an out-of-bounds read vulnerability that can expose large portions of memory associated with running Ollama processes, including user prompts, system instructions, configuration data, and environment variables, as a result of an out-of-bounds read.
Because Ollama is widely used as a local inference platform for open-source large language models such as Llama and Mistral, the disclosure has raised significant concerns among artificial intelligence and cybersecurity communities.
By using their own infrastructure rather than using external cloud providers, organizations and developers are able to run AI workloads directly.
There are approximately 170,000 stars on GitHub, over 100 million Docker Hub downloads, and deployment footprints on nearly 300,000 servers accessible through the internet, which highlight the growing security risks associated with rapidly adopted artificial intelligence ecosystems as well as the sensitive operational data they process. 
Cyera has identified the vulnerability, dubbed Bleeding Llama, to originate from an insecure handling of GGUF model files within Ollama, in which the server implicitly trusts tensor dimension values embedded inside uploaded models without performing adequate boundary validations. Through this design weakness, an application can manipulate memory access operations during model processing by creating specially crafted GGUF files, forcing it to read data outside the application’s intended memory buffers and incorporating fragments of sensitive runtime information into model artifacts generated by the application.
It is clear that the underlying problem is linked to the GPT-Generated Unified Format (GGUF), which is widely used to package and distribute large language models that can be efficiently executed locally. Similar to PyTorch’s .pt and .pth models, safetensors, and ONNX models, GGUF enables developers to store and execute open-source models directly on local computers without the need for external resources. 
The vulnerability is identified as a result of the manner Ollama processes these files during model creation, specifically by using Go’s unsafe package within a function known as WriteTo(). The implementation inadvertently exposes the heap to out-of-bounds reads when malicious tensor metadata is supplied because it relies on low-level memory operations that bypass standard language safety protections. 
It is possible to exploit this vulnerability by crafting a GGUF file with intentionally oversized tensor shape values and sending it to an exposed Ollama instance via the /api/create endpoint in an attack scenario. By manipulating dimensions, the application is forced to access memory regions outside the allocated boundaries during parsing and model generation. As a result, sensitive information contained within the Ollama process space is unintentionally disclosed. 
According to researchers, exposed memory may contain environment variables, authentication tokens, API credentials, system prompts, as well as portions of concurrent user interactions processed b

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article: