Memory management options in Win32
By George Mihaescu
Summary: this article presents the different options available to Win32 developer for memory management.
As a Win32 application developer, the Win32 APIs offers you a number of options for managing memory. The diagram below displays them all, and the rest of the article covers briefly each one.

1. Global and Local memory API: this API (consisting of GlobalAlloc / GlobalFree and LocalAlloc / LocalFree family of functions) is only provided for backward compatibility with Win16. Both local and global allocations using this API are internally mapped in the Win32 implementation to heap allocation functions. You should not be using this API to write a new Win32 application (I only listed this API here for completeness).
2. The Windows
heaps API: this API allows an application access to heaps. A Windows heap
should not be confused with the C++ heap, as the two terms, although related
(I'll explain below), are semantically different. A Windows heap is an object managed
by the Win32 subsystem that represents one or more regions (that are not
necessarily contiguous) in a process address space. When a Win32 process is
created, the Win32 subsystem creates an initial heap for it (called the process
default heap) – all allocations done using the backward compatibility
Global / Local API (described above) will be done here, but in a slower and
with more limitations than when using the heap allocation functions directly.
Allocations done using the COM IMalloc allocator (or the CoTaskMemAlloc
convenience wrapper function) are also done in this process default heap.
You can create additional heap(s) for your process (using the HeapCreate
function), called private heap(s) and you will be responsible for
managing them (allocating / freeing memory in the heap, releasing the heap,
etc).
3. Virtual Memory API: this API offers the lowest level of memory management available to a Win32 application. It allows the application to manipulate memory pages in the process address space. This API works on a two step allocation: you first reserve an area of memory in the process space (which does not actually use any of the available memory) then commit memory from the reserved area as you need memory (which does the actual allocation of memory). Conversely, de-commit an area that must be freed when you don't need the memory anymore (which releases the physical memory) and finally, release the memory that was reserved.
4. Memory Mapped File API: this API allows an application to map physical (disk) files (or portions of them) into its address space, or to reserve an area of memory (even if it is not backed up by a disk file). The typical use of this API is to share memory between one or more processes. The access to the shared memory (be it backed up by a disk file or not) from multiple processes, the access must be synchronized (using synchronization kernel objects, such as semaphores, mutexes or events).
If your application uses the C Run Time Library (CRT) you also have the CRT memory management API available (malloc, new, etc). Note that the internal implementation of the CRT memory management functions relies internally on the Win32 heaps API – and in some cases on the Virtual Memory API. Upon initialization, the CRT creates its own Windows heap (i.e. a private heap of the process that uses the library) in which the allocations using new / malloc will take place – but not always. Versions 6 and below of the CRT have a special mechanism for addressing the problem of heap fragmentation (a phenomenon in which applications that frequently allocate and de-allocate small blocks of memory can produce heaps that have many small non-contiguous free blocks, similar to disk fragmentation – such that the allocations requests start to be quite slow, or may even fail because there isn't one contiguous free block available to satisfy the allocation request). Those versions of the CRT deal with this problem by managing internally a "small block memory area" in which allocations below a certain threshold are done (through Virtual Memory API), while allocations of larger size go, as expected, in the Windows heap that the CRT has created when loaded in the process. Versions above 6 of the CRT don't have this mechanism anymore by default, but it can be enabled using the _set_sbh_threshold function they offer ("set small block heap threshold"). My article Analyzing the heaps of a Win32 process describes in detail how the CRT manages its memory and what you can do to improve its memory fragmentation handling.
Following all this, a natural question would be: "which one of those APIs should I use?" The answer to this depends on what you are trying to achieve. The table below lists a number of common scenarios and suggests which Win32 memory API is best used for that purpose.
1. I want to implement my own memory management mechanism, to allow me to allocate / free blocks in a manner that is more efficient for my application.
2. I want to share memory between processes.
3. I want an easy to manage buffer of memory that can grow potentially very large.
4. I want to optimize the memory allocations produced by the C/C++ memory management functions.
|
I want to implement my own memory management mechanism, to allow me to allocate / free blocks in a manner that is more efficient for my application |
|
·
Your fist choice would be the Virtual Memory API: first call VirtualAlloc
to reserve an address range of a given size in the memory space of your
process. Your memory management implementation would then commit areas of
this reserved space when there are memory allocation requests (using again VirtualAlloc,
but with a different flag), and de-commit them when a piece of memory that it
previously allocated is not needed anymore, using VirtualFree.
Finally, when the range of addresses originally reserved is not needed
anymore, use the VirtualFree again to release the memory. ·
Your second choice would be the Heaps API. This API is more
straightforward, so can be tackled easier by beginners, and has the advantage
that on certain platforms, the fragmentation issue is addressed. On XP and
above and on Windows 2003 Server (or 2000 Server with a hotfix), the Win32
heaps have a flag called LFH (Low Fragmentation Heap) that allows the Windows
heap management code to organize the heap allocations such that it reduces
the fragmentation. ·
The Memory Mapped File API is not a good candidate for
this purpose, as it has more overhead (lower performance) than the other APIs
and does not offer any additional benefit. |
|
I want to share memory between processes |
|
|
|
I want an easy to manage buffer of memory that can grow potentially very large |
|
If you need a substantially large buffer (e.g. 100 Meg)
out of which you may use only a portion, but you want the buffer to take only
the memory you are currently using, again the Memory Mapped file API is the
answer. The key here is the fact that a view of mapped memory is backed by the
paging file, and only the portion that is actually used takes up physical memory.
|
|
I want to optimize the performance of memory allocation functions in the C/C++ library |
|
For long-running processes that have random patterns of allocation of varying bock sizes, memory fragmentation of the CRT heap may start to become a performance problem. You can improve substantially this issue by: · Enabling the LFH (Low Fragmentation Heap) management offered by the Win32 implementation on certain platforms: XP and above, Windows Server 2003, Windows 2000 with a specific hotfix. To enable this feature, call the CRT _get_heap_handle (new in CRT 7.0, #include <malloc.h>). This returns you the handle of the Win32 heap used by the CRT for memory allocations through malloc / new. Then use HeapSetInformation to mark the heap as LFH. This is where you need to pay attention, because the API call HeapSetInformation will not be available platforms earlier than those listed above – so don’t call it directly: rather call LoadLibrary ("Kernel32.dll") then GetProcAddress ("HeapSetInformation") to get the function pointer to HeapSetInformation and call it – if you get a NULL the pointer back from the GetProcAddress call, you'll need to fall back on the following method: · Enabling the fragmentation protection in the CRT itself: since version 4.0, the CRT had a built-in mechanism against heap fragmentation. This mechanism is based on doing allocations of blocks over a certain size in the regular heap, as expected, but doing the allocations of blocks below that size in a memory area allocated and managed by the CRT (using the Virtual Memory API that I've described below), called the "small block heap". More recent versions of the CRT don't have this mechanism enabled by default, but you can check whether that's enabled by calling the CRT function _get_sbh_threshold ("get small block heap threshold"). If the return is 0, it means that the fragmentation protection in CRT is not on – to turn it on, call _set_sbh_threshold with the size in bytes that you want to be considered the threshold for a "small block". In CRT 4.0 this size was 420 bytes – you may want to start with that and experiment to see what works best for the allocation patterns of your application. |