|
|
|
Memory management options in Win32
|
|
Author: |
George Mihaescu |
|
Published: |
June 1, 2005 |
|
Category: |
Insight / Win32 |
|
Notes: |
|
|
Description: |
This article describes the various memory management APIs offered by Win32, describes each one
and offers guidelines for choosing the appropriate one for your purpose. |
|
View count: |
5,560 |
|
Comments: |
|
|
| |
|
|
|
|
 |
Print view
|
Memory management options in Win32
Memory management options in Win32
By George Mihaescu
Summary: this article presents the different options
available to Win32 developer for memory management.
Overview of the memory management APIs in Win32
As a Win32 application developer, the Win32 APIs offers you
a number of options for managing memory. The diagram below displays them all,
and the rest of the article covers briefly each one.

1. Global and Local
memory API: this API (consisting of GlobalAlloc / GlobalFree and LocalAlloc
/ LocalFree family of functions) is only provided for backward
compatibility with Win16. Both local and global allocations using this API are
internally mapped in the Win32 implementation to heap allocation functions. You
should not be using this API to write a new Win32 application (I only listed
this API here for completeness).
2. The Windows
heaps API: this API allows an application access to heaps. A Windows heap
should not be confused with the C++ heap, as the two terms, although related
(I'll explain below), are semantically different. A Windows heap is an object managed
by the Win32 subsystem that represents one or more regions (that are not
necessarily contiguous) in a process address space. When a Win32 process is
created, the Win32 subsystem creates an initial heap for it (called the process
default heap) – all allocations done using the backward compatibility
Global / Local API (described above) will be done here, but in a slower and
with more limitations than when using the heap allocation functions directly.
Allocations done using the COM IMalloc allocator (or the CoTaskMemAlloc
convenience wrapper function) are also done in this process default heap.
You can create additional heap(s) for your process (using the HeapCreate
function), called private heap(s) and you will be responsible for
managing them (allocating / freeing memory in the heap, releasing the heap,
etc).
3. Virtual Memory
API: this API offers the lowest level of memory management available to a Win32
application. It allows the application to manipulate memory pages in the
process address space. This API works on a two step allocation: you first reserve
an area of memory in the process space (which does not actually use any of the
available memory) then commit memory from the reserved area as you need
memory (which does the actual allocation of memory). Conversely, de-commit
an area that must be freed when you don't need the memory anymore (which
releases the physical memory) and finally, release the memory that was
reserved.
4. Memory Mapped
File API: this API allows an application to map physical (disk) files (or
portions of them) into its address space, or to reserve an area of memory (even
if it is not backed up by a disk file). The typical use of this API is to share
memory between one or more processes. The access to the shared memory (be it
backed up by a disk file or not) from multiple processes, the access must be
synchronized (using synchronization kernel objects, such as semaphores, mutexes
or events).
If your application uses the C Run Time Library (CRT) you
also have the CRT memory management API available (malloc, new,
etc). Note that the internal implementation of the CRT memory management
functions relies internally on the Win32 heaps API – and in some cases on the
Virtual Memory API. Upon initialization, the CRT creates its own Windows heap
(i.e. a private heap of the process that uses the library) in which the
allocations using new / malloc will take place – but not always.
Versions 6 and below of the CRT have a special mechanism for addressing the
problem of heap fragmentation (a phenomenon in which applications that
frequently allocate and de-allocate small blocks of memory can produce heaps
that have many small non-contiguous free blocks, similar to disk fragmentation
– such that the allocations requests start to be quite slow, or may even fail
because there isn't one contiguous free block available to satisfy the
allocation request). Those versions of the CRT deal with this problem by
managing internally a "small block memory area" in which allocations
below a certain threshold are done (through Virtual Memory API), while
allocations of larger size go, as expected, in the Windows heap that the CRT
has created when loaded in the process. Versions above 6 of the CRT don't have
this mechanism anymore by default, but it can be enabled using the _set_sbh_threshold
function they offer ("set small block heap threshold"). My article Analyzing
the heaps of a Win32 process describes in detail how the CRT manages its
memory and what you can do to improve its memory fragmentation handling.
More details
Following all this, a natural question would be: "which
one of those APIs should I use?" The answer to this depends on what you
are trying to achieve. The table below lists a number of common scenarios and suggests
which Win32 memory API is best used for that purpose.
1. I want to
implement my own memory management mechanism, to allow me to allocate / free
blocks in a manner that is more efficient for my application.
2. I want to share
memory between processes.
3. I want an easy
to manage buffer of memory that can grow potentially very large.
4. I want to
optimize the memory allocations produced by the C/C++ memory management
functions.
|
I want to implement my own memory management mechanism,
to allow me to allocate / free blocks in a manner that is more efficient for
my application
|
|
·
Your fist choice would be the Virtual Memory API: first call VirtualAlloc
to reserve an address range of a given size in the memory space of your
process. Your memory management implementation would then commit areas of
this reserved space when there are memory allocation requests (using again VirtualAlloc,
but with a different flag), and de-commit them when a piece of memory that it
previously allocated is not needed anymore, using VirtualFree.
Finally, when the range of addresses originally reserved is not needed
anymore, use the VirtualFree again to release the memory.
This is not very complex, but does require a good level of experience with
memory management techniques (among other reasons, because of the potential
for memory fragmentation) – so it's not something I would recommend for
beginners. It does offer, however, the best memory (de)allocation performance
available to applications.
·
Your second choice would be the Heaps API. This API is more
straightforward, so can be tackled easier by beginners, and has the advantage
that on certain platforms, the fragmentation issue is addressed. On XP and
above and on Windows 2003 Server (or 2000 Server with a hotfix), the Win32
heaps have a flag called LFH (Low Fragmentation Heap) that allows the Windows
heap management code to organize the heap allocations such that it reduces
the fragmentation.
Start by creating your private heap using HeapCreate. Consider using HeapSetInformation
with the LFH flag to take advantage of anti-fragmentation algorithms
implemented on the platforms specified above (depending on how you do your
allocations and de-allocations, this may or may not be needed). Then use HeapAlloc
to allocate and HeapFree to free blocks within this heap. Finally, use
HeapDestroy to get rid of the heap when it is not needed anymore.
The Heaps API is slower than the Virtual Memory API, but it is substantially
easier to implement a private memory manager on top of it.
·
The Memory Mapped File API is not a good candidate for
this purpose, as it has more overhead (lower performance) than the other APIs
and does not offer any additional benefit.
|
|
I want to share memory between processes
|
|
To share memory between any number of processes you should be using the
Memory Mapped File API.
Any of the processes involved can start by creating the file mapping using CreateFileMapping
and pass it the INVALID_HANDLE_VALUE as the handle to the file to map. This
means that you are not mapping a disk file in memory, but rather the mapping
comes from the system swap file. Pass a name to the function as the last
parameter – this would be the name that other processes must use to map the
same memory to their process space. Then call MapViewOfFile on the
handle returned by CreateFileMapping (committing the memory) to get a
pointer to the mapped memory created. You can use this pointer to write to it
sequentially; as you write to it, physical memory would be used as needed,
with no effort on your part – this is a very important feature of this API
because it hides the complexity of allocating memory and allows you to write
to the mapped memory as if it's a potentially very large buffer, but one that
only takes up the RAM for the size of data that has been written to the
buffer.
Any other process would call OpenFileMapping using the same name
passed when the mapped memory was created, then use MapViewOfFile to
get a pointer to that memory.
All processes involved will have to call UnmapViewOfFile using the
pointer they received from MapViewOfFile, then call CloseHandle
to finally close the mapped memory object obtained through Create/OpenFileMapping.
Having multiple processes access the shared memory obviously raises issues of
concurrency, so you will have to implement your own synchronization mechanism.
What I usually do is create a DLL that wraps the access to the shared memory
and takes care of the synchronization, then use the API exposed by this DLL
in all the processes involved.
|
|
I want an easy to manage buffer of memory that can grow
potentially very large
|
|
If you need a substantially large buffer (e.g. 100 Meg)
out of which you may use only a portion, but you want the buffer to take only
the memory you are currently using, again the Memory Mapped file API is the
answer. The key here is the fact that a view of mapped memory is backed by the
paging file, and only the portion that is actually used takes up physical memory.
Just as above, start by creating the file mapping using CreateFileMapping
and pass it the INVALID_HANDLE_VALUE as the handle to the file to map. This
means that you are not mapping a disk file in memory, but rather the mapping
comes from the system swap file. Pass NULL for the name of the mapped object,
as you will not need to share it with other processes. Then call MapViewOfFile
on the handle returned by CreateFileMapping (committing the memory) to
get a pointer to the mapped memory created (pass 0 for the view size to get a
view the size of the entire mapping). You can use this pointer to write to it
sequentially; as you write to it, physical memory would be used as needed,
with no effort on your part – this is a very important feature of this API
because it hides the complexity of allocating memory and allows you to write
to the mapped memory as if it's a potentially very large buffer, but one that
only takes up the RAM for the size of data that has been written to the
buffer.
This effectively creates a buffer that can keep growing (to the size
specified when the mapping was created), but that takes only the physical RAM
that is currently being used.
When done with this buffer, call UnmapViewOfFile using the pointer you
received from MapViewOfFile, then call CloseHandle to finally
close the mapped memory object obtained through CreateFileMapping.
This is a very simple and effective technique to create a buffer that can be
potentially very large, but that does not take physical RAM for more than it
actually needs.
|
|
I want to optimize the performance of memory allocation
functions in the C/C++ library
|
|
Depending on the version of the Microsoft CRT you are using, you may be able
to improve the performance of the malloc / new families of allocators.
For long-running processes that have random patterns of
allocation of varying bock sizes, memory fragmentation of the CRT heap may
start to become a performance problem. You can improve substantially this
issue by:
·
Enabling the LFH (Low Fragmentation Heap) management offered by
the Win32 implementation on certain platforms: XP and above, Windows Server
2003, Windows 2000 with a specific hotfix. To enable this feature, call the
CRT _get_heap_handle (new in CRT 7.0, #include <malloc.h>).
This returns you the handle of the Win32 heap used by the CRT for memory
allocations through malloc / new. Then use HeapSetInformation to mark
the heap as LFH. This is where you need to pay attention, because the API
call HeapSetInformation will not be available platforms earlier than
those listed above – so don’t call it directly: rather call LoadLibrary
("Kernel32.dll") then GetProcAddress
("HeapSetInformation") to get the function pointer to HeapSetInformation
and call it – if you get a NULL the pointer back from the GetProcAddress
call, you'll need to fall back on the following method:
·
Enabling the fragmentation protection in the CRT itself: since
version 4.0, the CRT had a built-in mechanism against heap fragmentation.
This mechanism is based on doing allocations of blocks over a certain size in
the regular heap, as expected, but doing the allocations of blocks below that
size in a memory area allocated and managed by the CRT (using the Virtual
Memory API that I've described below), called the "small block
heap". More recent versions of the CRT don't have this mechanism enabled
by default, but you can check whether that's enabled by calling the CRT
function _get_sbh_threshold ("get small block heap
threshold"). If the return is 0, it means that the fragmentation protection
in CRT is not on – to turn it on, call _set_sbh_threshold with the
size in bytes that you want to be considered the threshold for a "small
block". In CRT 4.0 this size was 420 bytes – you may want to start with
that and experiment to see what works best for the allocation patterns of
your application.
|
|
|
|