Friday, July 1, 2016

Searching memory with Rekall

This blog post covers the new searching capability in the latest Rekall release (Starting from Rekall 1.5.2). The searching capabilities in Rekall are powered by the Efilter project.

Customizing plugin output

Rekall is a plugin based framework. This means that Rekall comes with many plugins written by different contributors. For example, one of the most popular plugins is the pslist plugin. We can see some help about the pslist plugin by following it with a question mark (?):
We can see that by default, the plugin can filter the output by pid, or process name. These filters are common to all plugins which deal with processes. For example, suppose we want to only see the svchost.exe processes:
The pslist plugin has a typical tabular output. There are a number of columns which were pre-chosen by the plugin author, such as the address of the _EPROCESS struct, the process name, the pid etc.

This information is most useful, however, we don't have a way to customize the output much past those columns hard-wired into the plugin. For example, we might want to also sort the output based on the start time, or pid or something else.

How can we do this?  The solution is implemented by the Efilter library and the search plugin implemented in Rekall.

Efilter and the search plugin.

Efilter is a filtering framework which implements an SQL-like search language. This approach is not new. For example other tools (such as Volatility) can produce output into sqlite tables, which can subsequently be filtered using SQL.

The main difference with the Efilter approach is that Efilter does not actually use pre-extracted data, but rather runs Rekall plugins on demand automatically in order to satisfy each query. As we see below, this allows queries to inspect data which was never even exported directly by the plugin - giving a complete and flexible interface for inspection of analysis results.

The general search process is illustrated above. Efilter analyzes the query to figure out which plugins will be run, then the output from these plugins is fed into the Efilter framework where the specified filters are applied. The query can specify a set of columns to display (and possibly their sorting order). The result is a customized tabular output governed by the specified query.

Here is a trivial example:

select * from pslist()

In this example we simply select all the output from the pslist() plugin (using the SQL * specifier). Efilter simply runs the pslist plugin and outputs all the rows produced by the plugin with no filtering. The output is also identical to the pslist plugin.

In the above query, Efilter just re-emitted all the columns emitted by the plugin but we can pick and choose some of the columns. Before we can filter and select some columns from the pslist plugin, we need to know exactly what type of output the plugin is producing. The column names are human readable and may not always correspond to the specific name of the column itself.

We use the describe plugin to describe the output of the pslist plugin in much the same way that the SQL describe statement describes a table:
Next to each field, we see the type of the field. This gives us an idea of what type of filtering operation is possible with this field. For example, let's display only the wow64 processes and show which session they are running in. The wow64 field is a boolean field so we simply evaluate it in the where clause:

select _EPROCESS, session_id, wow64 from pslist() where wow64
If you look closely at the output you might notice that the _EPROCESS column is actually split into 3 different columns, the virtual address, the process name and the process pid. Similarly in the output of "describe pslist" above there is no specific column for the process name or pid - all we see is a single _EPROCESS column with a type of _EPROCESS.

What is going on? How can we select only the process name?

When a plugin emits a more complex type (in this case the plugin emits raw _EPROCESS objects), Rekall might employ a specialized renderer (or a customized output format) for this columm. In this case a single _EPROCESS object is shown as a small table with three columns. However, Efilter actually sees the raw object itself. Therefore as far as the search plugin is concerned the pslist plugin emits raw _EPROCESS objects for each row.

We can use this fact by dereferencing fields inside the _EPROCESS object itself within a search query. Let us repeat the "describe pslist" plugin, but this time we tell it to show sub fields to a depth of 1 (In this screenshot we snipped many of the fields because the output is long. The plugin actually shows all the members of the _EPROCESS struct, as well as Rekall defined pseudo-members and properties.):

We can see that the _EPROCESS object itself contains many fields, and they can all be used as filtering targets, columns and sort orders. Here is something a bit more complex:

select,, wow64 from pslist() where regex_search("svchost", order by desc

Here we sort by pid in reverse and show all the processes which match "svchost.exe" as well as their Wow64 status. Note that the built in search function finds the case insensitive regular expression anywhere in the filename.

Being able to drill into the objects returned by plugins allows users to invent completely different tables, even extending the output the original plugin was not designed to produce.

For example, closer inspection of the _EPROCESS object (either with the describe plugin or the dt plugin)  reveals that extra information is present in the _SE_AUDIT_PROCESS_CREATION_INFO struct produced by the windows auditing system. We also see a FullPath member on the _EPROCESS object (This is actually a virtual member added by Rekall which displays the full path to the binary running the process). Let's find all the processes which were started from locations other than the Windows directory and also show the audit system's record of where they were started:

SELECT,, _EPROCESS.FullPath, _EPROCESS.SeAuditProcessCreationInfo.ImageFileName.Name AS audit_name FROM pslist() WHERE NOT regex_search("Windows", _EPROCESS.FullPath)

Note the use of the regex_search() function applies a regular expression to a match, the use of the not operator to exclude this match and the use of the as operator to rename a column to a more meaningful name.

NOTE: Efilter currently provides the =~ operator for a regular expression match, however this matching is case sensitive. When matching windows file names we never want a case sensitive match or we might miss some filenames which should match. Therefore it is always preferable to use the case insensitive regex_search() function instead.


Sometimes it is useful to take the result of one search and apply it as an input to another plugin. Efilter supports this concept as a subquery. For example, suppose we asked - which processes were launched by a particular user? There is already a plugin which tells us this - the tokens() plugin:

select * from tokens() where regex_search("User: a", Comment)

We could also select by Sid but Rekall already resolves the Sid to a username for us. Now we would like to know which of these processes holds an open handle to the pmem driver?

select _EPROCESS, handle, access, details from handles(pids: (select from tokens() where regex_search("User: a", Comment)).pid) where regex_search("pmem", details)

The above example is a bit hard to follow because it is all on one line and has a subselect clause. Efilter also allows us to save entire queries and therefore make the query more readable. To make this easier we can use the %%search magic command in the Rekall shell. This allows us to write more complex, multi-line queries:

[1] win7.elf 20:37:02> %%search
let user_a_processes=select from tokens() where regex_search('User: a', Comment)
select _EPROCESS, handle, access, details from handles(pids: where regex_search('pmem', details)

How does this work?
  1. The let assignment stores a query at the variable user_a_processes. Note that it does not execute the query at this point yet. A stored query is simply a table with columns and rows:
    1. In this case there is only one column called "pid" and several rows.
  2. The next query executes the handles plugin and provides the stored query to the "pids" parameter. Since the query is just a table, we need to choose which column to expand into the pids arguments (this is the purpose of the second ".pid"). Now the plugin will receive a list of pids to operate on.
  3. The output from the handles plugin (restricted by the pids selected by the first query) is further filtered for a file handle matching "pmem"

Note that we could have just run the handles() plugin without arguments and filtered on the output but this would have been inefficient because Efilter will need to list all the handles for all the processes and then filter out those processes we don't care about. It is always important to try to reduce the total number of processes examined first off by providing good process selectors to plugins which take them. Efilter does not currently have a good feel as to the cost of running each plugin so it is up the user to decide which order the queries should be run in and how the output is to be combined.

In this blog post we have demonstrated how EFilter queries are useful to tailor exactly the output you need from Rekall plugins. In the next blog post we will discuss how to harness this power in formulating and recovering forensic artifacts for memory images.

Monday, May 23, 2016

Rekall and the windows PFN database

Rekall has long had the capability of scanning memory for various signatures. Sometimes we scan memory to try and recover pool tags (e.g. psscan), other times we might scan for specific indicators or Yara rules (e.g. yarascan). In the latest version of Rekall, we have dramatically improved the speed and effectiveness of these capabilities. This post explains one aspect of the new Rekall features and how it was implemented and can be used in practice to improve your forensic memory analysis.

Traditionally one can scan the physical address space, or the virtual address space (e.g. the kernel's space or the address space of a process). There are tradeoffs with each approach. For example, scanning physical memory is very fast because IO is optimized. Large buffers are read contiguously from the image and the signatures are applied on entire buffers. However, traditionally, this kind of scanning could only yield a result when a signature was found but did not include any context to the hit, which means that it is difficult to determine which process owned the memory and what the process was doing with the memory.

If we scan the virtual address space we see memory in the same way that a process sees it. This is ideal because if a signature is found, we can immediately determine which process address space it appears in, and precisely where, in that address space, the signature resides.

Unfortunately, scanning in a virtual address space is more time consuming because reading from the image (or live memory) is non-sequential. Rekall effectively has to glue together in the right order a bunch of page size buffers collected randomly from the image into a temporary buffer which can be scanned - this involves a lot of copying buffers and allocating memory. Additionally when scanning the address space of processes, we will invariably be scanning the same memory multiple times because mapped files (like DLLs) are shared between many processes, and so they appear in multiple processes' virtual address space.

It would be awesome if we could ask: given a physical address (i.e. offset in the memory image), which process owns this page and what is the virtual address of this page in the process address space? Being able to answer this question quickly allows us to scan physical memory in the most efficient way (at least for smallish signatures which do not span page boundaries).

Rekall has a plugin called pas2vas which aims to solve this problem. It is a brute force plugin: simply enumerate all the virtual address to physical address mappings and build a large reverse mapping. This works well enough, but takes a while to construct the reverse mapping and because of this does not work well on live memory which is continuously changing.

Have you ever used the RamMap.exe tool from sysinternals? It’s an awesome tool which lets one see what each physical page is doing on your system. Here is an example screenshot (click on the image to zoom in):

This looks exactly like what we want! How does this magic work? Through understanding and parsing the Windows PFN database (Windows Internals) it is possible to relate a physical address directly to the virtual address in the process which owns it very quickly and efficiently. If only we had this capability in Rekall we could provide sufficient context to scanners in physical memory to work reliably and quickly!

Lets explore how one can use the Windows PFN database to ask exactly what each physical page is doing. In the end we implemented new plugins such as "pfn", "p2v" and "rammap" to shed more light on how physical memory is used within the Windows operating system. These plugins are integrated with other plugins (e.g. yarascan to assist in providing more contextual information for physical addresses).

Windows page translation.

We all know about the AMD64 page tables and how they work so I won't go into it in too much depth here. Just to say that the hardware needs page tables to be written in memory, which control how the page resolution process works. The CR3 register contains the Directory Table Base (DTB) which is the address of the top most table.

The hardware then traverses these tables, by masking bits off the virtual address until it reaches the PTE which contains information about the physical page representing that virtual address. The PTE may be in a number of states (which we described in detail in previous posts and are also covered in the Windows Internals book). In any state other than the "Valid" state, the hardware will generate a page fault to find the actual physical page.

Rekall has the "vtop" plugin (short for virtual-to-physical) to help visualize what page translation is doing. Let us take for example a Windows 7 image:

Here we ask Rekall to translate the symbol for "nt" (The kernel's image base) in the kernel address space into its physical address. Rekall indicates the address of each entry in the 4 level page table and finally prints the PTE content in detail. We can see that the PTE for this symbol is a HARDWARE PTE which is valid (i.e. the page exists in physical memory) and the relevant page frame number is shown (i.e. physical address).

The important points to remember about page translation are:
  • The page tables are primarily meant for the hardware. Addresses for valid pages in the page tables are specified as Page Frame Numbers (PFN) which means they are specified in the physical address space. This is the only thing the MMU can directly access.
  • On the other hand, the CPU cannot access physical memory. All CPU access occurs through the virtual address space of the kernel.
    • Invalid PTEs may carry any data to be used by the kernel, therefore typically those addresses are specified in the kernel's virtual address space.
  • All PTEs (and all parts of the page tables) must be directly mapped into the kernel's address space at all times so that the kernel may manipulate them.

Each PTE controls access to exactly one virtual memory page. The PTE is the basic unit of control for virtual pages, and each page in any virtual address space must have at least one PTE controlling it (This is regardless if the page is actually mapped into physical memory or not - the PTE will indicate where the page can be found).

There are two types of PTEs - hardware PTE and prototype PTEs. Prototype PTEs are used by the kernel to keep track of intended page purpose, and although they have a similar format to hardware PTEs, they are never accessed directly by the MMU. Prototype PTEs are allocated from pool memory, while hardware PTEs are allocated from System PTE address range.

Section objects

Consider a mapped file in memory. Since the file is mapped into some virtual address space, some pages are copied from disk into physical memory, and therefore there are some valid PTEs for that address space. At the same time, other pages are not read from disk yet, and therefore must have a PTE which points at a prototype PTE.

The _SUBSECTION object is a management structure which keeps track of all the mapped pages of a range from mapped files. The _SUBSECTION object has an array of prototype PTEs - a management structure similar to the real PTE.
Consider the figure above - a file is mapped into memory and Windows creates a _SUBSECTION object to manage it. The subsection has a pointer to the CONTROL_AREA (which in turn points to the FileObject which it came from) and pointers to the Prototype PTE array which represents the mapped region in the file. In this case a process is reading the mapped area and so the hardware PTE inside the process is actually pointing at a memory resident page. The prototype PTE is also pointing at this page.

Now imagine the process gets trimmed - in this case the hardware PTE will be made invalid and point at the prototype PTE. If the process tries to access the page again a page fault will occur and the pager will consult the prototype PTE to determine if the page is still resident. Since it is resident the hardware PTE will be just changed back to valid and continue to point at that page.

Note that in this situation, the physical page still contains valid file data, and it is still resident. It's just that the page is not directly mapped in any process. Note that it is perfectly OK to have another process with a valid hardware PTE mapping to the same page - this happens if the page is shared with multiple processes (e.g. a DLL) - one process may have the page resident and can access it directly, while another process might need to invoke a page fault to access this page (which should be extremely quick since the page is already resident).

The Page File Number Database (PFN).

In order to answer the question: What is this page doing? Windows has the Page File Number database (PFN Db). It is simply an array of _MMPFN structs which starts at the symbol "nt!MmPfnDatabase" and has a single entry for every physical page on the system. The MMPFN struct must be as small as possible and so consists of many unions and can be in several states - depending on the state, different fields must be interpreted.

Free, Zero and Bad lists

We start off with discussing these states - they are the simplest to understand. Pages which can take on any of these states (A flag in the _MMPTE.u3.e1.PageLocation) are kept in their own lists of Free pages (ready to be used), Zero pages (already cleared) or Bad pages (will never be used).

Active Pages: PteAddress points at a hardware PTE.

If the PFN Type is set to Active, then the physical page is used by something. The most important thing to realize is that a valid physical page (frame) must be managed by a PTE.  Since that PTE record must also be accessible to the kernel, it must be mapped in the kernel's virtual address space.

When the PFN is Active, it contains 3 important pieces of information:
  1. The virtual address of the PTE that is managing this physical page (in _MMPFN.PteAddress).
  2. The Page Frame (Physical page number) of the PTE that is managing this physical page (in _MMPFN.u4.PteFrame). Note these two values provide the virtual and physical address of the PTE.
  3. The OriginalPte value (usually the prototype PTE which controls this page). When Windows installs a hardware PTE from a prototype PTE, it will copy the original prototype PTE into this field.

Here is an example of Rekall's pfn plugin output for such a page:

The interesting thing is that in this case, the PTE that is managing this page will belong in the Hardware page tables created for the process which is using this page. That PTE, in turn will also be accessible by a PDE inside that process's page tables, and so forth. This occurs all the way up to the root of the page table (DTB or CR3) which is its own PTE.

Therefore if we keep following the PTE which controls each PTE 4 times we will discover the physical addresses of the DTB, PML4, PDPTE, PDE and PTE belonging to the given physical address.  Since a DTB is unique to a process we immediately know which process owns this page.

Additionally we can also figure out what is the virtual address by subtracting the table offset of the real PTE from the start of the PTE table at each level of the paging structure, and assigning the relevant bits to that part of the virtual address. This is illustrated below.

So when the PFN is in this state (i.e. PteAddress pointing to a Hardware PTE) we can determine both the virtual address of this page and the process which maps it. It is also possible that another process is mapping the same page too. In this case the OriginalPTE will actually contain the _MMPTE_SUBSECTION struct which was originally filled in the prototype PTE. We can look at this value and determine the controlling subsection in a similar way to the below method.

Rekall's ptov plugin (short for physical-to-virtual) employs this algorithm to derive the virtual address and the owning process. Here is an example

We can verify this works by switching to the svchost.exe process context and converting the virtual address to physical. We should end up in the same physical address we started with:

Active Pages - PteAddress points at a prototype PTE.

Consider the case where two or more processes are sharing the same memory (e.g. mapping the same file). In order to aid in the management of this, Windows will create a subsection object as described earlier; if the virtual page is trimmed from the working set of one process, the hardware PTE will not be valid and instead point at the controlling subsection's prototype PTE.

In this case the PFN database will point directly at the prototype PTE belonging to the controlling subsection (The PFN entry will indicate that this is a prototype PTE with the _MMPFN.u3.e1.PrototypePte flag). Lets look at an example:

In this example, the PFN record indicates that a prototype PTE is controlling this physical page. The prototype PTE itself indicates that the page is valid and mapped into the correct physical page. Note that the controlling PTE for this page is allocated from system pool (0xf8a000342d50) while in the previous example, the controlling prototype was from the system PTE range (0xf68000000b88) and belonged to the process's hardware page tables.

If we tried to follow the same algorithm as we did before we will actually end up in the kernel's DTB because the prototype PTE is itself allocated from paged pool (so its controlling PTE will belong to the kernel's page tables). So in this case we need to identify the relevant subsection which contains the prototype PTE (and the processes that maps it).

When a process maps a file, it receives a new VAD region. The _MMVAD struct stores the following important information:
  1. The start and end virtual addresses of the VAD region in the process address space.
  2. The Subsection object which is mapped by this VAD region.
  3. The first prototype in the subsection PTE array which is mapped (Note that VADs do not have to map the entire subsection, the first mapped PTE can be in the middle of the subsection PTE array. Also the subsection itself does not have to map the entire file either - it may start at the _SUBSECTION.StartingSector sector).

The _MMPFN.PteAddress will point at one of the Prototype PTEs. We build a lookup table between every VAD region in every process and its range of prototype PTEs. We are then able to quickly determine which VAD regions in each process contain the pointed to PTE address, and so we know which process is mapping this file.

The result is that we are able to list all the processes mapping a particular physical page, as well as the virtual addresses each use for it (using the _MMVAD information). We also can tell which file is mapped at this location (from the _SUBSECTION data and the filename) and the sector offset within the file it is mapped to. Here is the Rekall output for the ptov plugin:

Rekall is indicating that this page contains data from the oleaut32.dll at file offset 0x8a600. And as you can see in the output this data is shared with a large number of processes.

Putting it all together

We can utilize these algorithms to provide more context for scanning hits in the physical address space. Here is an example where I search for my name in the memory image using the yarascan plugin:

The first hit shows that this page belong to the rekall.exe process, mapped at 0x5522000. The second hit occurs at offset 0x177800 inside the file called winpmem_2.0.1.exe etc.

This information provides invaluable context to the analyst and helps reasoning about why these hits occur where they do.

The rammap plugin aims to display every page and what it is being used for (Click below to zoom in). We can some pages are owned by processes, some are shared by files and others belong to the kernel pools:

Other applications: Hook detection

Inline hooking is a very popular way to hijack the execution path in a process or library. Malware typically injects foreign code into a process then overwrites the first few bytes of some critical functions with a jump instruction to detour into its code. When the API function is called, the malware hijacks execution and then typically relays the call back to the original DLL to service the API call. For example it is a common way to hide processes, files or network connections.

Here is an example output from Rekall's hooks_inline plugin which searches all APIs for inline hooks.

In this sample of Zeus (taken from the malware analyst's cookbook), we can clearly see a jump instruction to be inserted at the start of critical functions (e.g. NtCreateThread) for Zeus to monitor calls to these APIs. Rekall detects the hooks by searching the first few instructions for constructs which divert the flow of execution (e.g. jump, push-ret etc).

Let us consider what happens in the PFN database when Zeus installs these hooks. Before the hook installation. the page containing the functions is mapped to the DLL file from disk. When Zeus installs the trampoline by writing on the virtual memory, Windows changes the written virtual page from file backed to a private mapping. This is often called copy-on-write semantics - Windows makes a copy of the mapped file private to the process whenever the process tries to write to the page. Even if the page is shared between multiple processes, only the process which wrote to it would see these changes.

Let's examine the PFN record of the hooked function. First we switch to the process context, then find the physical page which backs the function "ntdll!NtCreateThread" (Note we can use the function's name interchangeably with the address for functions Rekall already knows about).

Now let's display the PFN record (Note that a PFN is just a physical address with the last 3 hex digits removed):

Notice that the controlling PTE is a hardware PTE (which means it exists in the process's page tables). There is only a single reference to this page which means it is private (Share Count is only 1).

Let's now examine the very next page in ntdll.dll (The next virtual page is not necessarily the next physical page so we need to repeat the vtop translation - again we use Rekall's symbolic notation to refer to addresses):

And we examine the PFN record for this next page:

It is clearly a prototype page, which maps a subsection object. It is shared with 22 other processes (ShareCount = 22). Let's see how this physical page is mapped in the virtual address:

So the takeaway from this exercise is that by installing a hook, Zeus has converted the page from shared to a private mapping. If we assume that Zeus does not change files on disk, then memory only hooks can only exist in process private pages and not in shared file pages. It is therefore safe to skip shared pages in our hook analysis. This optimization buys a speedup of around  6-10 times for inline hook inspection - all thanks to the windows PFN database!