Memory Allocation Implementation in the Microsoft C Runtime
Libraries for Win32
-Implications for memory leak detection tools-
By George Mihaescu
This article is obsolete because it refers strictly to old
versions of the Microsoft CRT libraries (4.x and 5.0). We have replaced it with
a more comprehensive and up-to-date article (see "Analyzing the heap(s) of
a Win32 program").
This article is still in our library because it goes into
details regarding the CRT 4.x and 5.0 memory management that the new article
does not cover (because the substantial differences in the CRT versions and the
varying implementations of the heaps in the underlying operating system).
Summary: This document describes how memory
allocation (the malloc family of functions and the new operator)
are implemented in the Microsoft C Runtime Libraries (CRT). It applies to
versions 4.x and 5.0 of the CRT. Other versions may exhibit the same behaviour
(as described below) but have not been tested. The acronym CRT in this document
refers to the above mentioned versions.
Background
The CRT offers the malloc
family of functions and the C++ global new operator for dynamic
allocations. Usually the memory area where allocations with malloc or new
are being done is referred to as the heap. The implementation of those
memory allocation functions in any CRT implementation must use the lower level
memory allocation facilities offered by the operating system it is implemented
on.
The WinNT (4.0) and Win95
implementations of Win32 API offer three mechanisms to manage the memory space
of a process:
·
virtual memory (using the VirtualAlloc, VirtualFree
family of APIs)
·
memory (mapped) files (using the Create/OpenFileMapping,
MapViewOfFile family of APIs)
·
heaps (using the HeapCreate, HeapAlloc, HeapFree
family of APIs).
The term heap in Win32 is
quite unfortunate since it overlaps with the concept of heap in the CRT.
As we will describe below, the two are not equivalent, and therefore we will
use the term winheap to describe those heaps offered by Win32 (created
with HeapCreate API) and the term heapC to describe the area
where blocks allocated with malloc or new reside.
CRT implementation on WinNT
The C++ new operator is in
fact implemented based on the malloc function. The malloc
function has different implementations for DEBUG and RELEASE build modes.
·
In DEBUG mode, the implementation of malloc (and
consequently of new) maps directly to allocations in the winheap (in
other words, malloc calls the Win32 HeapAlloc API). (Note: the
actual allocated size in DEBUG mode if slightly larger than the requested size
since the CRT uses a small portion at the beginning of the block and another
one at the end as "sentinels" in order to detect over-the-bound
writes to the memory blocks - as a very basic bound checking mechanism). The
winheap is created when the CRT initializes (done by the CRT entry point,
calling the Win32 HeapCreate API).
Consequently for DEBUG mode we can say that the heapC maps directly to the
winheap.
·
In RELEASE mode, the implementation of malloc (and
consequently of new) is different from debug mode: the size of the
actual allocated block is equal to the requested size and the allocated block
does not necessarily go into a winheap, as in DEBUG mode. Rather, some blocks
go into the winheap as before, while others are located into virtual memory
areas managed internally by the CRT. Upon RELEASE mode initialization, the CRT
creates the winheap (just as in DEBUG mode) but it also reserves an area of 4
Mbytes of the process space (usually starting at address 0x00410000), then
commits the first 64 Kbytes of this area (all being done with calls to VirtualAlloc).
Then all calls to malloc (or invocations of the global new
operator) check the requested memory size: blocks larger than 480 bytes are
allocated into the winheap (just like in DEBUG mode, with HeapAlloc)
while those of 480 bytes and less are allocated into the 64 kb committed area.
This process continues until the whole 64 Kbytes area is full, in which case
more memory is committed out of the remaining reserved area (initial size: 4
Mbytes - 64 Kbytes). If lots of small block allocations (<= 480 bytes) are
being done, when running out of space into the 4 Mbytes area, the CRT will
again repeat the process: reserve 4 Mbytes area in the process space and commit
the first 64 Kbytes, and so on. In other words, in RELEASE mode, the heapC is
made of the winheap plus the special "small block" area
managed by the CRT using the virtual memory APIs.
Important
note: This behaviour was seen on the DLL version of the CRT 4.x. The static
version of CRT 4.x behaves the same in RELEASE mode as in DEBUG mode. The CRT
5.0 exhibits the described behaviour in RELEASE mode for both static mode
library and the DLL version.
Practical consequences
·
The difference between the DEBUG and RELEASE mode implementations
of the CRT (specifically, the special "small block" area that only
exists in RELEASE mode) means that the winheap walking APIs would not be
able to detect memory leaks in a process in RELEASE mode. The Win32 API
offers the HeapWalk API which can walk the winheap(s) used by a process,
reporting every block in each winheap. Since in RELEASE mode the CRT uses the
special memory area for small blocks (<= 480 bytes), leaks of such small
blocks will not be detected by the HeapWalk-ing API in RELEASE mode!
This leak detection mechanism can only be detected in DEBUG mode! For RELEASE
mode, there is currently no accurate known mechanism for block-level leak
detection. VirtualQuery Win32 API can be used, but it will only report at
memory region level (for instance, it will report the 64 Kbytes committed area
for "small blocks", but there is no way to determine how many blocks
are actually using that space and the space takes by those blocks) - therefore
only tests that exercise the leak of small blocks repeatedly will report a leak
when using the VirtualQuery API (for instance if a test only leaks a
block of 400 bytes twice and those allocations will not cause the CRT to commit
more memory for the "small blocks" area, no leak will be reported,
when in fact the process has leaked 800 bytes - which in the long run can be
fatal for some processes).
·
The fact that the CRT creates the heapC (=winheap + "small
blocks" area if RELEASE) it uses upon its initialization means that two dependent
binary components that use CRT (for instance an EXE and its DLLs or two
DLLs, one using the services of the other) should use the DLL version of the
CRT for two reasons:
·
memory allocations done by one component can be safely freed by
the other, since all will operate onto the same memory areas. Should one (or
more) component(s) be linked statically with the CRT, the result will be a
special, private heapC for it created by its copy of the CRT initialization
routine - and consequently any allocations done by other components cannot be
freed here since that block is not in this component's heapC - the reverse is
also true: any allocation done by this component cannot be freed by any other
since the address of the block is from this component's heapC.
·
memory usage optimization: the space taken in the process memory
will be smaller since one copy of the CRT is loaded for all components.
|
CRT version
|
CRT library type
|
DEBUG Memory Behaviour
|
RELEASE Memory Behaviour
|
|
4.x
|
static
|
heapC = winheap
|
heapC = winheap (same as DEBUG)
|
|
|
DLL
|
heapC = winheap
|
heapC = winheap + "small
blocks" virtual area
|
|
|
|
|
|
|
5.0
|
static
|
heapC = winheap
|
heapC = winheap + "small
blocks" virtual area
|
|
|
DLL
|
heapC = winheap
|
heapC = winheap + "small
blocks" virtual area
|
Important note: The described
behaviour refers to WinNT 4.0. On Windows 95 tests have shown that the CRT
behaves always the same in DEBUG and RELEASE mode, always heapC = winheap
(there is no special area for the “small blocks” in RELEASE mode). But the HeapWalk-ing
API is not implemented on the Win95 implementation of Win 32.