Analyzing the heap(s) of a Win32 program

 

By George Mihaescu

 

 

Summary: this documents attempts to introduce you to the way memory allocations done by your C++ program are actually seen in the host Windows operating system and what you can do to analyze the content of the memory you’ve allocated. It also covers issues related to memory fragmentation and what you can do to improve the CRT memory management with regard to fragmentation.

This is usually critical in tracking hard to find memory leaks and in writing tools that verify that programs don’t leak memory.

Heaps

In C++ terminology, “heap allocation” refers to dynamic allocation, i.e. non-stack allocation. In the C++ semantics a program has one memory “heap” from which all dynamic allocations get their memory.

 

Unfortunately Windows uses the term “heap” with a completely different meaning. A Windows heap is a kernel object that is associated with a running process and is one mechanism that allows Win32 processes to manage memory – it is not the only one, though. As a programmer you may decide to use Windows heaps for your memory needs, or a different concept (such as a memory file or virtual memory). Below I fill focus on the Windows heaps, though – so please interpret the term “heap” below in Windows semantics.

 

A typical Win32 process has more than one heap - but always at least one. When the OS maps an executable image into memory, it creates the primary (default) heap for the new process. Other heaps can be created by the program itself or by various libraries the programs uses. The creation and management of such heaps is done via API calls such as HeapCreate, HeapAlloc, HeapRealloc, HeapFree, HeapDestroy.

 

CRT and the Windows heaps

As a C/C++ programmer, one library that’s always used by your program is the CRT. The Microsoft implementation of the CRT creates such an extra heap for all its allocations (the handle of this CRT heap is stored internally in the CRT library in a global variable called _crtheap) as part of its initialization. Every time a process attaches to the CRT, the CRT initialization code will check whether its heap has been created, and if not, will create it. The CRT will map all calls to malloc / free and new / delete to the heap-oriented Win32 API functions (HeapAlloc, HeapFree, etc) to allocate / deallocate in the dedicated heap (with the handle in variable _crtheap).

 

It must be said that various release of the CRT worked differently with the underlying Windows heap it created. Since an early stage (4.x), the DEBUG and RELEASE versions of the CRT differed quite substantially – for example:

·         CRT DEBUG version allocated blocks in the heap larger than the actual size requested by the program, to accommodate some overhead that the CRT DEBUG mode needed (markers around the block that help the CRT determine whether there were heap overwrites at the time the block was freed and give you a warning, etc).

·         CRT RELEASE version allocated blocks in the heap of the exact size requested by the program (i.e. no “guarding” of blocks), but requests of blocks below 480 bytes did not actually go in the heap! This was a technique that avoided a situation called “memory fragmentation”: programs that allocated and freed small blocks of memory very frequently and in random order had the chance of creating a heap that was very fragmented (i.e. lots of small free blocks between the allocated ones, so that requests for new allocations would not find a free block large enough to accommodate the requested size). The more memory is fragmented, the slower the program becomes, as the memory management code needs to hunt for large enough free blocks, and if one large enough is not found, to make more memory available. In order to avoid this, the RELEASE mode CRT used the threshold of 480 bytes (probably determined empirically) to determine what’s a “small block”; requests of less than 480 bytes did not go on the heap, as all other requests, but into virtual memory areas managed internally by the CRT. Upon initialization the CRT reserved an area of 4 Mbytes of the process memory (usually starting at address 0x00410000), then commits the first 64 Kbytes of this area (all being done with calls to VirtualAlloc). Then all calls to malloc (or invocations of the global new operator) check the requested memory size: blocks larger than 480 bytes are allocated into the heap (just like in DEBUG mode, with HeapAlloc) while those of 480 bytes or less are allocated into the 64 kb committed area. This process continues until the whole 64 Kbytes area is full, in which case more memory is committed out of the remaining reserved area (initial size: 4 Mbytes - 64 Kbytes). If lots of small block allocations (<= 480 bytes) are being done, when running out of space into the 4 Mbytes area, the CRT will again repeat the process: reserve 4 Mbytes area in the process space and commit the first 64 Kbytes, and so on. In other words, in RELEASE mode, the “heap in the C++ sense) is made of the Windows heap plus the special "small block" area managed by the CRT using the virtual memory APIs.

This means that using the heap walking techniques described below, in RELEASE mode you will not be able to see some of the small (<= 480 bytes) objects in the process heaps! In DEBUG mode those will be visible, but in RELEASE mode those blocks will just “vanish” from the heaps – keep this in mind when you are interpreting the results you get from walking the heaps!

 

Important notes:


But there is something you can do to improve this: at the very beginning of your program you can either:

 

Analyzing the heaps

To recap:

 

Win32 provides the GetProcessHeap API to get the primary (default) heap of a process. To get all the heaps of a process, use the GetProcessHeaps API.

 

The HeapWalk API allows you to traverse the list of blocks in a particular heap (obtained with one of the GetProcessHeap(s)). This way a debugging module part of the process can obtain the size of all the heaps or of any particular heap of the process, assisting in tracking down memory leaks.

 

Below is the code for a very simple function that lists the allocated blocks in all heaps used by the process calling it (note that this code is updated for VS 2005 and will not backward-compile due to the _get_heap_handle CRT call and HeapQueryInformation Win32 API – you may want to comment those out if compiling with older versions of Visual Studio):

 

void Dump_Blocks_In_All_Heaps ()

{

       //get all the heaps in the process

       HANDLE heaps [100];

       DWORD c = ::GetProcessHeaps (100, heaps);

       printf ("The process has %d heaps.\n", c);

 

       //get the default heap and the CRT heap (both are among

//those retrieved above)

       const HANDLE default_heap = ::GetProcessHeap ();

       const HANDLE crt_heap = (HANDLE) _get_heap_handle ();

 

       for (unsigned int i = 0; i < c; i++)

       {

              //query the heap attributes

              ULONG heap_info = 0;

              SIZE_T ret_size = 0;

              if (::HeapQueryInformation (heaps [i],

                                          HeapCompatibilityInformation,

                                          &heap_info,

 sizeof (heap_info),

 &ret_size))

              {

                     //show the heap attributes

                     switch (heap_info)

                     {

                     case 0:

                           printf ("Heap %d is a regular heap.\n", (i + 1));

                           break;

                     case 1:

                           printf ("Heap %d is a heap with look-asides (fast heap).\n", (i + 1));

                           break;

                     case 2:

                           printf ("Heap %d is a LFH (low-fragmentation) heap.\n", (i + 1));

                           break;

                     default:

                           printf ("Heap %d is of unknown type.\n", (i + 1));

                           break;

                     }

 

                     if (heaps [i] == default_heap)

                     {

                           printf (" This the DEFAULT process heap.\n");

                     }

                     if (heaps [i] == crt_heap)

                     {

                           printf (" This the heap used by the CRT.\n");  

                     }

 

                     //walk the heap and show each allocated block inside it

//(the attributes of each entry will differ between

//DEBUG and RELEASE builds)

                     PROCESS_HEAP_ENTRY entry;

                     memset (&entry, 0, sizeof (entry));

                     int count = 0;

                     while (::HeapWalk (heaps [i], &entry))

                     {

                           if (entry.wFlags & PROCESS_HEAP_ENTRY_BUSY)

                           {

                                  printf (" Allocated entry %d: size: %d, overhead: %d.\n", ++count, entry.cbData, entry.cbOverhead);

                           }

                     }

              }

       }

}

 

Note that the CRT offers a function called _heapwalk, provided as a debugging helper. This function is in fact implemented on the Win32 API HeapWalk, called for the dedicated CRT heap (identified by the handle in _crtheap). Therefore this function will only work for memory that has been allocated through CRT calls. If memory blocks are allocated using the Win32 heap APIs (HeapAlloc, etc), these blocks will not be traced by the CRT _heapwalk function. As a conclusion, use the CRT _heapwalk only to trace memory allocated using CRT, and use HeapWalk Win32 API to trace memory in any heap (i.e. regardless of the way it's been allocated).

 

IMPORTANT: There is also another major limitation of _heapwalk: it is dependent on how the CRT has been linked in the various binary modules of the process. For instance, if a DLL is statically linked with CRT, the linker incorporates a version of the CRT into the DLL image. When it is used by a process, calls to _heapwalk from the DLL functions will use the handle in the copy of the _crtheap variable that is incorporated into the DLL, and therefore will trace into the private copy of CRT of the DLL - and will miss any memory allocation done in the rest of the modules, although they were also done using CRT calls! On the other hand, calls to _heapwalk from the calling module will use the _crtheap handle stored in the CRT library used by that module, and will miss any memory allocations done in the DLL! Basically the problem is the usual one of having the different binary modules of a process (main EXE and additional DLLs) linked statically with CRT. Ideally, all binary modules of a program should be linked with CRT in DLL form, so that they all share the same global variables. But this may not always be possible (i.e. using DLLs from third parties) - consequently, when the development environment introduces this problem,  it is much safer to go directly to Win32 API rather than CRT calls.

However, keep in mind that depending on what version of the CRT you are using (or if you used the _set_sbh_threshold CRT function), you may not be able to track down the “small block” allocations by heap walking as described above.