Reversing at Scale: AI-Powered Malware Detection for Apple’s Binaries

TL;DR:
We ran our new AI-based Mach-O analysis pipeline in production, no metadata, no prior detections, just raw Apple binaries. On Oct 18, 2025, out of 9,981 first-seen samples, VT Code Insight surfaced multiple real Mac and iOS malware cases that had 0 antivirus detections at submission time, including a multi-stage AppleScript infostealer and an iOS credential-stealing tweak. It also helped identify 30 antivirus false positives, later confirmed and fixed.

By Bernardo Quintero, Tom Bennett, and Paul Tarter

The Challenge: Reversing at Scale

The long-term goal of Code Insight is ambitious but simple to state: use AI to reason about every single file that reaches VirusTotal in real time. That’s more than two million samples a day, so scalability and efficiency aren’t nice-to-haves, they’re requirements.

We started this journey in early 2023 by analyzing small PowerShell scripts under 25 KB, focusing on fast, context-limited reasoning. As Gemini’s token capacity grew, we expanded support to larger files and richer formats: Office documents with macros, PDFs containing embedded objects, and package types such as NPM, SWF, SVG, MCP, CRX, VSIX, etc. Each step pushed the boundaries of what Code Insight could interpret automatically.

Eventually, we reached compiled binaries, by far the most challenging class due to their size, complexity, and low-level structure. Analyzing native code with large language models is not straightforward: Mach-O binaries can be massive, and full decompilation or disassembly often exceeds even the largest model contexts, while being too slow and expensive for a high-volume production pipeline.

To make this feasible, we built a pruning-based summarization layer. Instead of feeding Gemini a full decompilation or noisy disassembly, we first extract the most informative elements: code entry points, key imports and exports, relevant strings, and selected function summaries, using Binary Ninja’s High Level Intermediate Language (HLIL) for native code. The goal isn’t to reconstruct the full program logic, but to preserve just enough structure for meaningful reasoning.

This distilled representation fits comfortably within Gemini’s 1M-token context window and allows us to generate a concise, human-readable analyst summary in a single LLM call, regardless of the binary’s size. It’s a pragmatic balance between depth and scalability, good eno

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from VirusTotal Blog

Read the original article: