Emulation of Malicious Shellcode With Speakeasy

Read the original article: Emulation of Malicious Shellcode With Speakeasy


In order to enable emulation of malware samples at scale, we have
developed the Speakeasy
emulation framework
. Speakeasy aims to make it as easy as
possible for users who are not malware analysts to acquire triage
reports in an automated way, as well as enabling reverse engineers to
write custom plugins to triage difficult malware families.

Originally created to emulate Windows kernel mode malware, Speakeasy
now also supports user mode samples. The project’s main goal is high
resolution emulation of the Windows operating system for dynamic
malware analysis for the x86 and amd64 platforms. Similar emulation
frameworks exist to emulate user mode binaries. Speakeasy attempts to
differentiate from other emulation frameworks the following ways:

  • Architected specifically around emulation of Windows
    malware
  • Supports emulation of kernel mode binaries to analyze
    difficult to triage rootkits
  • Emulation and API support
    driven by current malware trends to provide the community with a
    means to extract indicators of compromise with no extra tooling
  • Completely configurable emulation environment requiring no
    additional code

The project currently supports kernel mode drivers, user mode
Windows DLLs and executables, as well as shellcode. Malware samples
can be automatically emulated, with reports generated for later post
processing. The ongoing project goal will be continuing to add support
for new or popular malware families.

In this blog post, we will show an example of Speakeasy’s
effectiveness at automatically extracting network indicators from a
Cobalt Strike Beacon sample acquired from an online malware aggregate.

Background

Dynamic analysis of Windows malware has always been a crucial step
during the malware analysis process. Understanding how malware
interacts with the Windows API and extracting valuable host-based and
network-based indicators of compromise (IOCs) are critical to
assessing the impact malware has on an affected network. Typically,
dynamic analysis is performed in an automated or targeted fashion.
Malware can be queued to execute within a sandbox to monitor its
functionality, or manually debugged to reveal code paths not executed
during sandbox runs.

Code emulation has been used historically for testing, validation
and even malware analysis. Being able to emulate malicious code lends
many benefits from both manual and automated analysis. Emulation of
CPU instructions allows for total instrumentation of binary code where
control flow can be influenced for maximum code coverage. While
emulating, all functionality can be monitored and logged in order to
quickly extract indicators of compromise or other useful intelligence.

Emulation provides several advantages over execution within a
hypervisor sandbox. A key advantage is noise reduction. While
emulating, the only activity that can be recorded is either written by
the malware author, or statically compiled within the binary. API
hooking within a hypervisor (especially from a kernel mode
perspective) can be difficult to attribute to the malware itself. For
example, sandbox solutions will often hook heap allocator API calls
without knowing if the malware author intended to allocate memory, or
if a lower-level API was responsible for the memory allocation.

However, emulation has disadvantages as well. Since we are removing
the operating system from the analysis phase, we, as the emulator, are
now responsible for providing the expected inputs and outputs from API
calls and memory access that occur during emulation. This requires
substantial effort in order to successfully emulate malware samples
that are expected to the run on a legitimate Windows system.

Shellcode as an Attack Platform

In general, shellcode is an excellent choice for attackers to remain
stealthy on an infected system. Shellcode runs within executable
memory and does not need to be backed by any file on disk. This allows
attacker code to hide easily within memory where most forms of
traditional forensic analysis will fail to identify it. Either the
original binary file that loads the shellcode must first be
identified, or the shellcode itself must be dumped from memory. To
avoid detection, shellcode can be hidden within a benign appearing
loader, and then be injected into another user mode process.

In the first part of this blog series, we will show the
effectiveness of emulation with one of the more common samples of
shellcode malware encountered during incident response investigations.
Cobalt Strike is a commercial penetration testing framework that
typically utilizes stagers to execute additional code. An example of a
stager is one that downloads additional code via a HTTP request and
executes the HTTP response data. The data in this case is shellcode
that commonly begins with a decode loop, followed by a valid PE that
contains code to reflectively load itself. In the case of Cobalt
Strike, this means it can be executed from the start of the executable
headers and will load itself into memory. Within the Cobalt Strike
framework, the payload in this case is typically an implant known as
Beacon. Beacon is designed to be a memory resident backdoor used to
maintain command and control (C2) over an infected Windows system. It
is built using the Cobalt Strike framework without any code
modifications and can be easily built to have its core functionality
and its command and control information modified.

All of this allows attackers to rapidly build and deploy new
variants of Beacon implants on compromised networks. Therefore, a tool
to rapidly extract the variable components of Beacon are necessary
and, ideally, will not require the valuable time of malware analysts.

Speakeasy Design

Speakeasy currently employs the QEMU-based emulator engine Unicorn
to emulate CPU instructions for the x86 and amd64 architectures.
Speakeasy is designed to support arbitrary emulation engines in the
future via an abstraction layer, but it currently relies on Unicorn.

Full OS sandboxing will likely always be required to analyze all
samples as generically emulating all of Windows is somewhat
unfeasible. Sandboxing can be difficult to scale on demand and can be
time consuming to run samples. However, by making sure we emulate
specific malware families, such as Beacon in this example, we can
quickly reduce the need to reverse engineer variants. Being able to
generate high level triage reports in an automated fashion is often
all the analysis that is needed on a malware variant. This allows
malware analysts more time to focus on samples that may require deeper analysis.

Shellcode or Windows PEs are loaded into the emulated address space.
Windows data structures required to facilitate basic emulation of
Windows kernel mode and user mode are created before attempting to
emulate the malware. Processes, drivers, devices and user mode
libraries are “faked” in order to present the malware with a realistic
looking execution environment. Malware will be able to interact with
an emulated file system, network and registry. All these emulated
subsystems can be configured with a configuration file supplied to
each emulation run.

Windows APIs are handled by Python API handlers. These handlers will
try to emulate expected outputs from these APIs so that malware
samples will continue their expected execution path. When defining an
API handler, all that is needed is the name of the API, the number of
arguments the API expects, and an optional calling convention
specification. If no calling convention is supplied, stdcall is
assumed. Currently, if an API call is attempted that is not supported,
Speakeasy will log the unsupported API and move on to the next entry
point. An example handler for the Windows HeapAlloc function exported
by kernel32.dll is shown in Figure 1.



Figure 1: Example handler for Windows
HeapAlloc function

All entry points are emulated by default. For example, for DLLs, all
exports are emulated, and for drivers, the IRP major functions are
each emulated. In addition, dynamic entry points that are discovered
during runtime are followed. Some examples of dynamic entry points
include threads that are created or callbacks that are registered.
Attributing activity to specific entry points can be crucial to seeing
the whole picture when trying to identify the impact of a malware infection.

Reporting

Currently, all events captured by the emulator are logged and
represented by a JSON report for easy post processing. This report
contains events of interest that are logged during emulation. Like
most emulators, all Windows API calls are logged along with arguments.
All entry points are emulated and tagged with their corresponding API
listings. In addition to API tracing, other specific events are called
out including file, registry and network access. All decoded or
“memory resident” strings are dumped and displayed in the report to
revealed useful information not found within static string analysis.
Figure 2 shows an example of a file read event logged in a Speakeasy
JSON report.



Figure 2: File read event in a Speakeasy report

Speed

Because the framework is written in Python, speed is an obvious
concern. Unicorn and QEMU are written in C, which provides very fast
emulation speeds; however, the API and event handlers we write are in
Python. Transitioning between native code and Python is extremely
expensive and should be done as little as possible. Therefore, the
goal is to only execute Python code when it is absolutely necessary.
By default, the only events we handle in Python are memory access
exceptions or Windows API calls. In order to catch Windows API calls
and emulate them in Python, import tables are doped with invalid
memory addresses so that we only switch into Python when import tables
are accessed. Similar techniques are used for when shellcode accesses
the export tables of DLLs loaded within the emulated address space of
the malware. By executing as little Python code as possible, we can
maintain reasonable speeds while still allowing users to rapidly
develop capabilities for the framework.

Memory Management

Speakeasy implements a lightweight memory manager on top of the
emulator engine’s memory management. Each chunk of memory allocated by
malware is tracked and tagged so that meaningful memory dumps can be
acquired. Being able to attribute activity to specific chunks of
memory can prove to be extremely useful for analysts. Logging memory
reads and writes to sensitive data structures can reveal the true
intent of malware not revealed by API call logging, which is
particularly useful for samples such as rootkits.

Speakeasy offers an optional “memory tracing” feature that will log
all memory accesses that samples exhibit. This will log all reads,
writes and executes to memory. Since the emulator tags all allocated
memory chunks, it is possible to glean much more context from this
data. If malware hooks a critical data structure or pivots execution
to dynamically mapped memory this will be revealed and can be useful
for debugging or attribution. This feature comes at a great speed
cost, however, and is not enabled by default.

The emulated environment presented to malware includes common data
structures that shellcode uses to locate and execute exported Windows
system functions. It is necessary to resolve exported functions in
order to invoke the Win32 API and therefore have meaningful impact on
a targeted system. In most cases, Beacon included, these functions are
located by walking the process environment block (commonly called the
PEB). From the PEB, shellcode can access a list of all loaded modules
within a process’s virtual address space.

Figure 3 shows a memory report generated from emulating a Beacon
shellcode sample. Here we can trace the malware walking the PEB in
order to find the address of kernel32.dll. The malware then manually
resolves and calls the function pointer for the “VirtualAlloc” API,
and proceeds to decode and copy itself into the new buffer to pivot execution.



Figure 3: Memory trace report

Configuration

Speakeasy is highly configurable and allows users to create their
own “execution profiles”. Different levels of analysis can be
specified in order to optimize individual use cases. The end goal is
allowing users easy switching of configuration options with no code
changes. Configuration profiles are currently structured as JSON
files. If no profile is provided by the user, a default configuration
is provided by the framework. The individual fields are documented
within the Speakeasy project.

Figure 4 shows a snippet of the network emulator configuration
subsection. Here, users can specify what IP addresses get returned
when a DNS lookup occurs, or in the case of some Beacon samples, what
binary data gets returned during a TXT record query. HTTP responses
have custom responses configured as well.



Figure 4: Network configuration

Many HTTP stagers will retrieve a web resource using a HTTP GET
request. Often, such as with Cobalt Strike or Metasploit stagers, this
buffer is then immediately executed so the next stage of execution can
begin. This response can be easily configured with Speakeasy
configurations. In the configuration in Figure 4, unless overridden,
the framework will supply the data contained in the referenced
default.bin file. This file currently contains debug interrupt
instructions (int3), so if the malware attempts to execute the data it
exits and will be logged in the report. Using this, we can easily
label the malware as a downloader that downloads additional code.
Configuration fields also exist for file system and registry
emulation. Files and registry paths can similarly be configured to
return data to samples that expect to be running on a live Windows system.

Limitations

As said, emulation comes with some challenges. Maintaining feature
parity with the system being emulated is an ongoing battle; however,
it provides unique opportunities for controlling the malware and
greater introspection options.

In cases where emulation does not complete fully, emulation reports
and memory dumps can still be generated in order to gather as much
data as possible. For example, a backdoor may successfully install its
persistence mechanism, but fail to connect to its C2 server. In this
situation, the valuable host-based indicators are still logged and can
provide value to an analyst.

Missing API handlers can quickly and easily be added to the emulator
in order to handle these situations. For many API handlers, simply
returning a success code will be sufficient to make the malware to
continue execution. While full emulation of every piece of malware may
not be feasible, targeting functionality of specific malware families
can greatly reduce the need to reverse eng

[…]


Read the original article: Emulation of Malicious Shellcode With Speakeasy